Performing Department
(N/A)
Non Technical Summary
In coming decades, climate change will jeopardize production of staple crops in the U.S., highlighting the need for sustainable intensification and improved climate-crop response prediction. This project seeks to address this challenge through the application of machine learning to develop multi-scale (within-field to national scales), pixel-based biophysical suitability maps for corn (Zea mays) and soybean (Glycine max). Using an extensive yield monitor dataset (over 60 farms throughout the U.S. spanning 3-10 years per farm) a model will be trained to predict relative yield based on climate data from the Parameter-elevation Regressions on Independent Slopes Model, soil data from the Soil Survey Geographic Database, and digital elevation models. Additionally, the model will be used to predict suitability under climate change scenarios in the next three decades using data from the Climate Model Intercomparison Project. This project will result in the creation of novel crop suitability models for corn and soybean which, coupled with predictive weather trend tools will allow for improved response to stochastic weather events. Dynamic suitability maps from this project will be made available on cloud-based applications for research and agricultural producer planning. This fellowship will support the professional development of the applicant and enable him to conduct research that will address the challenge of sustainable intensification and advance the state of agricultural climate change adaptation.
Animal Health Component
40%
Research Effort Categories
Basic
40%
Applied
40%
Developmental
20%
Goals / Objectives
The main goal ofthis project is to apply machine learningmethods to the analysis of "Big" agricultural datatoextractcrop-specific suitabilityinformation relevant to sustainable agricultural intensification and climate change adaptation. This project will address three main research objectives:(1) develop a machine learning model that can estimate within-field crop suitabilitybasedon yield monitordatafor U.S. corn and soybean byconsidering site-specificclimate, soil, and terrain conditions, (2) expandthe machine learning modeltopredict crop suitability for the entireU.S.given climate, soil, and terrain conditions, and (3)usethisnational-scale suitabilitymodel to predict crop suitabilityresponse to climate change usingrepresentative concentration pathwaysdescribedin theCoupled Model Intercomparison Project.The specifics of eachobjectivearediscussed below.Objective 1.Develop machine learning models that can estimate within-field crop suitability on yield monitor farms for corn and soybean based on climate, soil, and terrain conditions.Sub-objective 1.1:Cleaning and preprocessing ofyield monitor dataset from over 60 U.S. farmsspanning 3-10 years per farm,collected as part of the DIFM researchproject.Sub-objective 1.2:Cleaning and preprocessing of soil, climate, and landscape terrain factors will beused as predictive variables in the models.Sub-objective 1.3:Model selection and feature selection.Three well-establishedMLmodels will be evaluated and compared for final use in this study: Random Forest, Support Vector Machine,andArtificial Neural Network. Model selection will be conducted based on interpretability, accuracy, and data and computational requirements. Iffeasiblebased on the amount and quality of data, preference will be given toadeep learning neural network model.Sub-objective 1.4:Model training and cross-fold validation. Bothclassification and regression approaches will be explored.Sub-objective 1.5:Model evaluation including quantitative error and accuracy metrics.Objective 2:Expand the machine learning model to predict crop suitability for the entire United States given climate, soil, and terrain conditions.Sub-objective 2.1:Cleaning and processing of national scale soil, climate, and terrain data.Sub-objective 2.2:Iterative scaling of model from objective 1 to be operational at the national scale.Sub-objective 2.3: Model training and predictive mapping of U.S. crop suitability.Sub-objective 2.4:Model evaluation and accuracy assessment using DIFM yield monitor data andcounty-level yields from the National Agricultural Statistics Service.Sub-objective 2.5:Sharevalidated suitability mapswith producers and house on a web-based tool hosted by USDA or using Google Earth Engine Apps.Objective 3:Use thenational-scalesuitabilitymodel to predict crop suitability response to climate change using representative concentration pathways described in the Coupled Model Intercomparison Project.Sub-objective 3.1:Clean and process climate data fromCoupled Model Intercomparison ProjectPhase5(CMIP5)climate projectionsfor input in to machine learning model.Sub-objective 3.2:Evaluation ofimpact of each RCP scenario on crop suitability. Special attention will be given to changing spatial patterns of crop suitabilitytoinform agricultural planning.Sub-objective 3.3:Development of maps and evaluationof areas ofhigh riskand opportunities forclimateadaptation.Sub-objective 3.4:Development of web-based app using Google Earth Engine that allows dynamic updating of suitability predictions based on new climate data.Data dissemination objectivesDevelopment of a web app in Google Earth Engine in consultation with extension professionalsDrafting of 2-3 manuscripts for publication in peer reviewed journals
Project Methods
The overarching goal of this project is to apply state-of-the-art ML methods to the analysis of "Big" agricultural data to extract crop-specific suitability information relevant to sustainable agricultural intensification and climate change adaptation. To accomplish this goal, this project will address three main research objectives: (1) develop a machine learning model that can estimate within-field crop suitability based on yield monitor data for corn and soybean considering site-specific climate, soil, and terrain conditions, (2) expand the machine learning model to predict crop suitability for the entire U.S. given climate, soil, and terrain conditions, and (3) use this national-scale suitability model to predict crop suitability response to climate change using representative concentration pathways described in the Coupled Model Intercomparison Project. The specifics of each objective are discussed below.A novel empirical approach to suitability modeling will be used in this project which leverages a large-scale yield monitor dataset from over 60 U.S. farms spanning 3-10 years per farm, collected as part of the DIFM research project. Yield monitor data will be cleaned and interpolated to raster datasets with the highest resolution possible. Crop-specific yields will be scaled between 0 and 1 across all years to represent corn and soybean suitability in terms of relative yield. These crop-specific relative yield values will serve as the response variables in machine learning models. This empirical approach is novel for suitability modeling because it is based on observational data rather than relying on mechanistic assumptions about crop responses to the environment. Soil, climate, and landscape terrain factors will be used as predictive variables in the models. Soil sampling was conducted as part of the DIFM project, and includes soil texture, pH, cation exchange capacity, soil nutrients, and soil organic carbon. Other important soil variables such as soil depth and parent material will be derived from 10-meter, gridded Soil Survey Database (SSURGO). Climate variables will be derived from 800-meter PRISM data (PRISM Climate Group, Oregon State University), and will include temperature (mean, minimum and maximum), precipitation (mm), dew point temperature, and vapor pressure deficit. Terrain variables will be derived from 1- and 10-meter digital elevation models (DEMs) for each area, and will include elevation, slope, aspect, channel depth, valley depth, topographic wetness index, and fractal component. All data will be processed to the same geographic coordinate system, resampled to the same resolution, and associated with yield outcomes using Google Earth Engine and R Project Software.Three well-established ML models will be evaluated and compared for final use in this study: Random Forest, Support Vector Machine, and Artificial Neural Network. Model selection will be conducted based on interpretability, accuracy, and data and computational requirements. If feasible based on the amount and quality of data, preference will be given to a deep learning neural network model. Deep learning models have gained significant attention for their strong ability to predict yield outcomes and improve agricultural practices when supplied with sufficient data. A classification and a regression modeling approach will be examined for efficacy. In the first, the deep learning model will be used to classify crop suitability (1=Poor, 2=Marginal, 3=Fair, 4=Good, 5=Optimal). The ranks will be determined by equal interval splits of relative yield values (for example, relative yield of 0-0.2 = Very Poor, while relative yield of 0.8-1.0 = Optimal). In the regression approach, the model will instead classify suitability as an index value ranging between 0 and 1, based directly on the calculated relative yield values. A feature selection procedure using recursive feature elimination will be used to eliminate variables from the model that do not contribute to improved model accuracy and to reduce the total number of features considered in the model. The models will be trained on 70% of available data, while 30% will be reserved for testing. For the classification approach, accuracy assessment will be conducted based on ranked relative yield values while in the regression approach we will evaluate the model using R2 and Root Mean Squared Error between predicted and actual relative yield. Additionally, a qualitative analysis will compare suitability during average weather (based on 30-year climate normals) with suitability from years when severe weather occurred (drought, flood, etc.).In a scaled-up iteration of Objective 1, the trained suitability models will be used to predict suitability for the continental U.S. Since the DIFM soil data are only present for farms included in the study, soils information will instead be derived from SSURGO. PRISM climate data and DEM-derived terrain data will be collected for the continental U.S. Data processing will occur on High Performance Computing (HPC) systems available at the University of Arkansas and through USDA-ARS, as well as on cloud-based platforms such as Google Earth Engine. An iterative process will be used to refine the ML algorithm from Objective 1 so that it can be applied at the national scale and to account for differences in data quality and resolution between the DIFM data and national scale data. The model will be refined based on available data at the national scale and reevaluated for accuracy on the DIFM yield monitor data before prediction is carried out at the national scale. The Cropland Data Layer will be used to mask out areas not relevant to suitability assessment, such as urban areas and water bodies. National suitability maps will be developed for corn and soybean and will be evaluated first using the DIFM yield monitor data (n=63), spanning roughly 3-10 years per farm. Next, the suitability models will be averaged for each U.S. County and validated against reported county-level yields from the National Agricultural Statistics Service. The resulting validated maps will be shared with producers and housed on a web-based tool hosted by USDA or using Google Earth Engine Apps.The models developed in Objective 2 will be used to quantitatively estimate where and what degree of climate impacts may be expected for agricultural suitability in the U.S. The climate data used for this analysis will be high-resolution and bias-corrected Coupled Model Intercomparison Project Phase 5 (CMIP5) climate projections available from the CCAFS-Climate data portal (ccafs-climate.org). The CCAFS climate database has gridded climate data for four Representative Concentration Pathways (RCP 2.6, 4.5, 6.0, and 8.5), at a resolution of 30 arc-seconds (~1 km). In this study, analysis will examine only on the years 2029, 2039, and 2049 to limit the scope and focus on relevant upcoming climate impacts. The ML models developed in Objective 2 will be supplied with climate data for these years for each of the four RCPs to understand the possible range of effects of changing climate conditions on corn and soybean suitability. Model results for crop suitability under climate change will be evaluated for each crop and RCP scenario. Special attention will be given to changing spatial patterns of crop suitability to inform agricultural planning. An evaluation of areas of high risk and opportunities for climate adaptation will be also conducted. Maps will be housed on a web-based app and a tool will be developed so that users can input new climate data as they become available and visualize effects on suitability beyond the life of this project.