Source: CORNELL UNIVERSITY submitted to NRP
AGRICULTURAL ANALYTICS DATA PLATFORM
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1011323
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Nov 4, 2016
Project End Date
Sep 30, 2019
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
CORNELL UNIVERSITY
(N/A)
ITHACA,NY 14853
Performing Department
Applied Economics & Management
Non Technical Summary
The increased availability of high resolution environmental, climate, and economic data, coupled with the dramatic progression of cheap computing power, has spurred enormous interest in the potential uses of such data in large-scale empirical applications for agricultural economics, climate change, and agricultural finance related policy research, among others; and, precision agriculture has increased our ability to manage and make use of site specific data for management via the interlinking and interfacing of various technologies and data, which itself can generate large amounts of data. State-of-the-art cyber-infrastructures and innovative computational tools are of fundamental importance to all realms of sustainability science. To advance our understanding of the complex dynamics of the Earth system, and the multidimensional relationships that humans have with that system, there is general agreement that available datasets must be collected, validated, analyzed, visualized, and synthesized.Expanding on our proof-of-concept efforts, the purpose of this grant is to develop an open ag-analytics data platform for applied research and research based tools.
Animal Health Component
25%
Research Effort Categories
Basic
25%
Applied
25%
Developmental
50%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
60161992090100%
Goals / Objectives
The availability of big data in food systems, climate, and economics, coupled with the expansion of computing power, has spurred enormous interest in analytics. Yet, our ability to manage/use such data has not kept pace. Expanding on our proof-of-concept efforts, the purpose of this grant is to develop a open ag-analytics data platform for applied research and research based tools. Pilot development efforts for this platform have been on-going, and in the course of that we have generated quite a lot of support and users to date. We also have a project with the USDA Chief Economist Office for developing use cases, and have been working with others including the Meridian Institute to try to obtain necessary funding and cooperation from the agencies, but further development is needed.
Project Methods
The research approach will revolve around 3 primary efforts: development of the data infrastructure (sourcing jobs, design and construction), development of outward facing tools (both point-and-click tools, documentation, and as programmatic interfaces/API's for research), and development of community resources, users, and collaborators. For web, we primarily use ASP/C#. It is a well developed, industrial, and widely employed web framework. Our back end database is Microsoft SQL Server for our main app, but we also have a CKAN installation we are using as an interactive data catalog which uses PostGIS, as does our Geoserver installation. SQL Server is a relational database management system developed by Microsoft whose primary function is to store and retrieve data as requested by other software applications, be it on the same computer or those running on a network. There are advantages and disadvantages of any database platform. In our initial efforts, we have found SQL Server to have the advantage that it is a mature industrial RDBMS appropriate for handling the loads we are putting on it currently. The disadvantage is that it is a traditional relational based platform, and is a commercial product. Long run we will start to make a move towards NoSQL platforms such as MongoDB or Hadoop framework, but thus far have not found that necessary. We also use a variety of open source and other tools for data processing including GIS tools (e.g. GDAL/OGR), OpenLayer, Python, and others.

Progress 11/04/16 to 09/30/19

Outputs
Target Audience:The target audience of this project includes researchers and policymakers who are interested in food systems, economics, climate, and agricultural-related questions. Products from this project also facilitate farmers' decision making as they can view data and use decision support tools on the website. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The databases were used in undergraduate and graduate-level classes at Cornell to help students understand the management and application of cloud storage. The diverse data on the platform helped students develop financial, statistical, and agricultural models and analytical tools in an industrial setting. The platform and insurance premium calculators were also introduced to extension professionals and farmers through workshops and presentations to facilitate their data demand. How have the results been disseminated to communities of interest?Results were presented in national and international seminars and workshops to reach to our target audience. For example, the study of implied volatility discovery methodology in the federal crop insurance program was presented at the American Risk and Insurance Association 2018 Annual Meeting in Chicago, IL, on August 8, 2018. The study was also introduced in 2018 China Agricultural Economic Review (CAER) and the International Food Policy Research Institute (IFPRI) Annual Conference in Guangzhou, China, on November 8, 2018. This study used data and methods developed as a part of this project. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We consolidated the SQL tables and developed data tools for the USDA Office of the Chief Economist (OCE) to facilitate their research, and collaborated with the OCE on the transportation study by using the data platform. The data tools and research component provide a better understanding of the relationship between transportation and farmers welfare. The data platform was also utilized to create insurance decision tools. Farmers are able to use the online insurance calculators to estimate their insurance premiums and expected profit under different insurance options. The insurance decision tools help farmers digest Federal crop insurance policies and understand how to use crop insurance to mitigate agricultural risks. The data platform was further applied to a series of research projects. It also provides convenience to extension staff at Cornell University and researchers at other universities for their analysis and reports. Cornell staff built up tremendous data management skills in this project. The skills as well as experience have been applied to related projects, such as the USDA Food Environment Data System to create online data and visualization tools.

Publications

  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Woodard, J.D. and Yi, J., 2018. Estimation of insurance deductible demand under endogenous premium rates. Journal of Risk and Insurance.


Progress 10/01/17 to 09/30/18

Outputs
Target Audience:The target audience is researchers, research institutes, government agencies, and farmers. Changes/Problems:The original PI (Joshua Woodard) left the university and a new PI (Jennifer Ifft) takes the project. The Ag-Analytics server is moved with the original PI, but we are building a new server to host data. What opportunities for training and professional development has the project provided?The data platform is introduced and used in the classroom. In courses AEM4070&6070 at Cornell University, students learned data management knowledge by using this data platform, so they have opportunities to be familiar with industrial data platform and practice with SQL database on the cloud. How have the results been disseminated to communities of interest?We created online instructions and interactive tools to help users understand how to use the data platform. What do you plan to do during the next reporting period to accomplish the goals?We are collaborating with USDA-ERS on the Food and Environmental Data System projects. The project hosts a comprehensive database for food and environmental related research projects, which is an extension of the current data we have.

Impacts
What was accomplished under these goals? We created webpages and tools for the USDA Chief Economist Office to facilitate their research, and collaborated with the Chief Economist Office on the transportation study by using the data platform.The data platform is utilized to develop risk management tools to help farmers understand how to use crop insurance to mitigate agricultural risks.The data platform is used in a series of research projects. It also provides convenience to extension staff at Cornell University and researchers at other universities for their analysis and reports.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Woodard, J.D. and Jing Yi, Analysis of Implied Volatility Discovery Methodology in the Federal Crop Insurance Program, present at the American Risk and Insurance Association 2018 Annual Meeting, Chicago, IL, August 5-8, 2018
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Woodard, Joshua D., and Jing Yi. "Estimation of insurance deductible demand under endogenous premium rates." Journal of Risk and Insurance (2018).


Progress 11/04/16 to 09/30/17

Outputs
Target Audience:The primary beneficiaries will be stakeholders in NY (farmers, food processors, government agencies, cooperative banks, etc.) who would benefit from better information and better access to data. Farmers will benefit by having new products and practical analytic tools so they can access the data and models in a meaningful way. Researchers will also benefit through having better data systems for doing their work and building analytics tools. There is an emphasis on food systems, economic, climate, and agricultural data. We will also provide access to farmers and other stakeholders for beta testing of systems. Other efforts could also include building data systems to crowdsource data from farmers, restaurants, and food processors directly, which could then be used in developing tools for pricing and marketing, early warning systems, and other systems that would benefit from integration with large scale but difficult to access data already available (e.g., weather, soil, and production databases) We have engaged other colleagues to facilitate tool development based on our platform in order to facilitate translation of research in to actionable analytics. Changes/Problems:We refactored the ETL job for collecting data from National Agricultural Statistics Service (NASS), so besides scraping and storing the NASS data, we also track the weekly changes made in NASS Animals & Products, Crops, Demographics, and Economics. Create an automated unit tester to compare insurance premiums estimated by Ag-Analytics insurance tool and RMA's web tool. Fixed bugs in Ag-Analytics' premium calculator, and provided suggestions as well as feedbacks to RMA. The Ag-Analytics insurance tool could also export the intermediate parameters for each premium estimation. Provide Ag-Analytics users the option to include the administration fee, which is not offered on RMA's web tool. Made some major changes in the Ag-Analytics insurance tool, such as incorporating the high-risk endorsement in the Ag-Analytics insurance tool and showing pop-up windows for warnings. Added modules for the whole farm revenue protection policy. The Ag-Analytics WFRP insurance tool provides beginning farmers more flexibilities, compared with RMA's tool. Added API documentations in different formats (Swagger webpage and word documentations) to serve the audience with a different background. The Ag-Analytics data and insurance API's are also used in undergraduate and graduate courses to help students understand the applications of API's in data analysis. Fixed the existing scripts for grain transportation data, the commitments of traders (COT) reports, weekly agricultural commodities exports, net sales, and outstanding sales data, weekly grain transportation report data, ethanol plants data, RMA data, etc. We made a lot of changes in our existing ETL jobs due to the changes in the source data. Contacted various source data providers. Revised our metadata and readme files to convenient our users and researchers. We integrate and visualized the transportation, grain price data, and weather data to demonstrate the applications of Ag-Analytics data warehouse. We also added new ETL jobs, such as Farm Payment Information (Direct Payment) We automate the process of collecting the records on farm payments made to agricultural program participants published by Farm Service Agency (FSA). Here is the link to the source data: https://www.fsa.usda.gov/FSA/webapp?area=newsroom&subject=landing&topic=foi-er-fri-pfi U.S. Farmer Information We scrape U.S. farmers' name and address information from Farm Service Agency (FSA). Here is the link to the source data: https://www.fsa.usda.gov/FSA/webapp?area=newsroom&subject=landing&topic=foi-er-fri-pfi Daily Agroclimate data Daily agroclimate data for the U.S. with a one-degree latitude by one-degree longitude grid, which contains major agroclimate features such as average/maximum/minimum air temperature, radiation, dew/frost point temperature, and wind speed. A lookup table is created for the data to convenient users. The data source is: https://power.larc.nasa.gov/cgi-bin/cgiwrap/solar/agro.cgi Lock Performance Monitoring Data Real-time lock performance reports, such as the lock queue, tonnage and traffic reports published by the U.S. Army Corps of Engineers. Source data is available at http://corpslocks.usace.army.mil/lpwb/f?p=121:7:0: Crop Insurance Data We collect crop insurance data published by Risk Management Agency (RMA), such as the RMA Summary of Business Data, RMA Summary of Business Data with coverage level, and Crop Insurance Summary Data with type/practice/unit structure. Here is the link to the source data: https://www.rma.usda.gov/data/sob/scc/index.html What opportunities for training and professional development has the project provided?We have hosted several workshops and given several talks on the ag-analytics.org open sourcesystem, and also funded a postdoctoral research associate on the project. How have the results been disseminated to communities of interest?The results have been disseminated via in person presentations and workshops as well as online. We have built an extensive data portal and querying tools that can be used by researchers and analysts. We also receive and respond to a lot of emails from users to help them better use the tools. What do you plan to do during the next reporting period to accomplish the goals?We plan to continue to add additional data sets, and continue to refine our data interface. A major goal is also to prepare a final paper for "Science - Data" or similar journal to explain the collection.

Impacts
What was accomplished under these goals? -Ag-Analytics Data Catalogue that hosts Ag-Analytics datasets that are highly available online and securely protected. -Ag-Analytics Developer tools. These tools are built on top of the Ag-Analytics data catalogue to provide a very intuitive and easy to use UI for researches to retrieve information from the Ag-Analytics Data Catalogue. Ag-Analytics Developer tools are very flexible in that the users can join different data sets together in a single query. Without these tools it would take a considerable time and effort for researches to extract valuable information that supports their research. -Advanced logging system: We deployed an advanced site wide system that help us extract reports about what is being used on the service the most and the way the users are interacting with the tools. These reports helps us realize which features needs to be improved and scaled to provide improvements and create the best user experience based on these logger reports. -Open source documentation of all the APIs. -Detailed description of all the public database fields which includes Commodity futures prices, Milk prices, daily prism weather data and so on in the data catalog page -A search Query tool was implemented where the researcher can customize their query according to the results which they need and view. -The users will be limited to a certain number of times they can query the database. After that they will have to log into Ag-Analytics system. -It can be found out how many users used what queries in the query tool and the API documentation and give a feedback to the users so that next time some other users query they can view this analysis -Created Swagger API documentation for public database of Ag-Analytics

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Big data and Ag-Analytics: An open source, open data platform for agricultural & environmental finance, insurance, and risk J Woodard - Agricultural Finance Review, 2016