Where Can We Grow? Machine Learning to Predict Climate Impacts on U.S. Corn and Soybean Suitability

WHERE CAN WE GROW? MACHINE LEARNING TO PREDICT CLIMATE IMPACTS ON U.S. CORN AND SOYBEAN SUITABILITY

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

ACTIVE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1030799

Grant No.

2023-67011-40361

Cumulative Award Amt.

$180,000.00

Proposal No.

2022-11407

Multistate No.

(N/A)

Project Start Date

Jun 1, 2023

Project End Date

May 31, 2026

Grant Year

2023

Program Code

[A7101]- AFRI Predoctoral Fellowships

Recipient Organization
UNIVERSITY OF ARKANSAS
(N/A)
FAYETTEVILLE,AR 72703

Performing Department
(N/A)

Non Technical Summary
In coming decades, climate change will jeopardize production of staple crops in the U.S., highlighting the need for sustainable intensification and improved climate-crop response prediction. This project seeks to address this challenge through the application of machine learning to develop multi-scale (within-field to national scales), pixel-based biophysical suitability maps for corn (Zea mays) and soybean (Glycine max). Using an extensive yield monitor dataset (over 60 farms throughout the U.S. spanning 3-10 years per farm) a model will be trained to predict relative yield based on climate data from the Parameter-elevation Regressions on Independent Slopes Model, soil data from the Soil Survey Geographic Database, and digital elevation models. Additionally, the model will be used to predict suitability under climate change scenarios in the next three decades using data from the Climate Model Intercomparison Project. This project will result in the creation of novel crop suitability models for corn and soybean which, coupled with predictive weather trend tools will allow for improved response to stochastic weather events. Dynamic suitability maps from this project will be made available on cloud-based applications for research and agricultural producer planning. This fellowship will support the professional development of the applicant and enable him to conduct research that will address the challenge of sustainable intensification and advance the state of agricultural climate change adaptation.

Animal Health Component

40%

Research Effort Categories

Basic

40%

Applied

40%

Developmental

20%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
203	2410	2060	35%
132	0430	2060	35%
101	0199	2060	30%

Knowledge Area
203 - Plant Biological Efficiency and Abiotic Stresses Affecting Plants; 132 - Weather and Climate; 101 - Appraisal of Soil Resources;

Subject Of Investigation
0430 - Climate; 0199 - Soil and land, general; 2410 - Cross-commodity research--multiple crops;

Field Of Science
2060 - Geography;

Keywords

Goals / Objectives
The main goal ofthis project is to apply machine learningmethods to the analysis of "Big" agricultural datatoextractcrop-specific suitabilityinformation relevant to sustainable agricultural intensification and climate change adaptation. This project will address three main research objectives:(1) develop a machine learning model that can estimate within-field crop suitabilitybasedon yield monitordatafor U.S. corn and soybean byconsidering site-specificclimate, soil, and terrain conditions, (2) expandthe machine learning modeltopredict crop suitability for the entireU.S.given climate, soil, and terrain conditions, and (3)usethisnational-scale suitabilitymodel to predict crop suitabilityresponse to climate change usingrepresentative concentration pathwaysdescribedin theCoupled Model Intercomparison Project.The specifics of eachobjectivearediscussed below.Objective 1.Develop machine learning models that can estimate within-field crop suitability on yield monitor farms for corn and soybean based on climate, soil, and terrain conditions.Sub-objective 1.1:Cleaning and preprocessing ofyield monitor dataset from over 60 U.S. farmsspanning 3-10 years per farm,collected as part of the DIFM researchproject.Sub-objective 1.2:Cleaning and preprocessing of soil, climate, and landscape terrain factors will beused as predictive variables in the models.Sub-objective 1.3:Model selection and feature selection.Three well-establishedMLmodels will be evaluated and compared for final use in this study: Random Forest, Support Vector Machine,andArtificial Neural Network. Model selection will be conducted based on interpretability, accuracy, and data and computational requirements. Iffeasiblebased on the amount and quality of data, preference will be given toadeep learning neural network model.Sub-objective 1.4:Model training and cross-fold validation. Bothclassification and regression approaches will be explored.Sub-objective 1.5:Model evaluation including quantitative error and accuracy metrics.Objective 2:Expand the machine learning model to predict crop suitability for the entire United States given climate, soil, and terrain conditions.Sub-objective 2.1:Cleaning and processing of national scale soil, climate, and terrain data.Sub-objective 2.2:Iterative scaling of model from objective 1 to be operational at the national scale.Sub-objective 2.3: Model training and predictive mapping of U.S. crop suitability.Sub-objective 2.4:Model evaluation and accuracy assessment using DIFM yield monitor data andcounty-level yields from the National Agricultural Statistics Service.Sub-objective 2.5:Sharevalidated suitability mapswith producers and house on a web-based tool hosted by USDA or using Google Earth Engine Apps.Objective 3:Use thenational-scalesuitabilitymodel to predict crop suitability response to climate change using representative concentration pathways described in the Coupled Model Intercomparison Project.Sub-objective 3.1:Clean and process climate data fromCoupled Model Intercomparison ProjectPhase5(CMIP5)climate projectionsfor input in to machine learning model.Sub-objective 3.2:Evaluation ofimpact of each RCP scenario on crop suitability. Special attention will be given to changing spatial patterns of crop suitabilitytoinform agricultural planning.Sub-objective 3.3:Development of maps and evaluationof areas ofhigh riskand opportunities forclimateadaptation.Sub-objective 3.4:Development of web-based app using Google Earth Engine that allows dynamic updating of suitability predictions based on new climate data.Data dissemination objectivesDevelopment of a web app in Google Earth Engine in consultation with extension professionalsDrafting of 2-3 manuscripts for publication in peer reviewed journals

Project Methods
The overarching goal of this project is to apply state-of-the-art ML methods to the analysis of "Big" agricultural data to extract crop-specific suitability information relevant to sustainable agricultural intensification and climate change adaptation. To accomplish this goal, this project will address three main research objectives: (1) develop a machine learning model that can estimate within-field crop suitability based on yield monitor data for corn and soybean considering site-specific climate, soil, and terrain conditions, (2) expand the machine learning model to predict crop suitability for the entire U.S. given climate, soil, and terrain conditions, and (3) use this national-scale suitability model to predict crop suitability response to climate change using representative concentration pathways described in the Coupled Model Intercomparison Project. The specifics of each objective are discussed below.A novel empirical approach to suitability modeling will be used in this project which leverages a large-scale yield monitor dataset from over 60 U.S. farms spanning 3-10 years per farm, collected as part of the DIFM research project. Yield monitor data will be cleaned and interpolated to raster datasets with the highest resolution possible. Crop-specific yields will be scaled between 0 and 1 across all years to represent corn and soybean suitability in terms of relative yield. These crop-specific relative yield values will serve as the response variables in machine learning models. This empirical approach is novel for suitability modeling because it is based on observational data rather than relying on mechanistic assumptions about crop responses to the environment. Soil, climate, and landscape terrain factors will be used as predictive variables in the models. Soil sampling was conducted as part of the DIFM project, and includes soil texture, pH, cation exchange capacity, soil nutrients, and soil organic carbon. Other important soil variables such as soil depth and parent material will be derived from 10-meter, gridded Soil Survey Database (SSURGO). Climate variables will be derived from 800-meter PRISM data (PRISM Climate Group, Oregon State University), and will include temperature (mean, minimum and maximum), precipitation (mm), dew point temperature, and vapor pressure deficit. Terrain variables will be derived from 1- and 10-meter digital elevation models (DEMs) for each area, and will include elevation, slope, aspect, channel depth, valley depth, topographic wetness index, and fractal component. All data will be processed to the same geographic coordinate system, resampled to the same resolution, and associated with yield outcomes using Google Earth Engine and R Project Software.Three well-established ML models will be evaluated and compared for final use in this study: Random Forest, Support Vector Machine, and Artificial Neural Network. Model selection will be conducted based on interpretability, accuracy, and data and computational requirements. If feasible based on the amount and quality of data, preference will be given to a deep learning neural network model. Deep learning models have gained significant attention for their strong ability to predict yield outcomes and improve agricultural practices when supplied with sufficient data. A classification and a regression modeling approach will be examined for efficacy. In the first, the deep learning model will be used to classify crop suitability (1=Poor, 2=Marginal, 3=Fair, 4=Good, 5=Optimal). The ranks will be determined by equal interval splits of relative yield values (for example, relative yield of 0-0.2 = Very Poor, while relative yield of 0.8-1.0 = Optimal). In the regression approach, the model will instead classify suitability as an index value ranging between 0 and 1, based directly on the calculated relative yield values. A feature selection procedure using recursive feature elimination will be used to eliminate variables from the model that do not contribute to improved model accuracy and to reduce the total number of features considered in the model. The models will be trained on 70% of available data, while 30% will be reserved for testing. For the classification approach, accuracy assessment will be conducted based on ranked relative yield values while in the regression approach we will evaluate the model using R2 and Root Mean Squared Error between predicted and actual relative yield. Additionally, a qualitative analysis will compare suitability during average weather (based on 30-year climate normals) with suitability from years when severe weather occurred (drought, flood, etc.).In a scaled-up iteration of Objective 1, the trained suitability models will be used to predict suitability for the continental U.S. Since the DIFM soil data are only present for farms included in the study, soils information will instead be derived from SSURGO. PRISM climate data and DEM-derived terrain data will be collected for the continental U.S. Data processing will occur on High Performance Computing (HPC) systems available at the University of Arkansas and through USDA-ARS, as well as on cloud-based platforms such as Google Earth Engine. An iterative process will be used to refine the ML algorithm from Objective 1 so that it can be applied at the national scale and to account for differences in data quality and resolution between the DIFM data and national scale data. The model will be refined based on available data at the national scale and reevaluated for accuracy on the DIFM yield monitor data before prediction is carried out at the national scale. The Cropland Data Layer will be used to mask out areas not relevant to suitability assessment, such as urban areas and water bodies. National suitability maps will be developed for corn and soybean and will be evaluated first using the DIFM yield monitor data (n=63), spanning roughly 3-10 years per farm. Next, the suitability models will be averaged for each U.S. County and validated against reported county-level yields from the National Agricultural Statistics Service. The resulting validated maps will be shared with producers and housed on a web-based tool hosted by USDA or using Google Earth Engine Apps.The models developed in Objective 2 will be used to quantitatively estimate where and what degree of climate impacts may be expected for agricultural suitability in the U.S. The climate data used for this analysis will be high-resolution and bias-corrected Coupled Model Intercomparison Project Phase 5 (CMIP5) climate projections available from the CCAFS-Climate data portal (ccafs-climate.org). The CCAFS climate database has gridded climate data for four Representative Concentration Pathways (RCP 2.6, 4.5, 6.0, and 8.5), at a resolution of 30 arc-seconds (~1 km). In this study, analysis will examine only on the years 2029, 2039, and 2049 to limit the scope and focus on relevant upcoming climate impacts. The ML models developed in Objective 2 will be supplied with climate data for these years for each of the four RCPs to understand the possible range of effects of changing climate conditions on corn and soybean suitability. Model results for crop suitability under climate change will be evaluated for each crop and RCP scenario. Special attention will be given to changing spatial patterns of crop suitability to inform agricultural planning. An evaluation of areas of high risk and opportunities for climate adaptation will be also conducted. Maps will be housed on a web-based app and a tool will be developed so that users can input new climate data as they become available and visualize effects on suitability beyond the life of this project.

Progress 06/01/23 to 05/31/24

Outputs
Target Audience:The target audiences reached by efforts during this reporting period include agriculture and ecology researchers, policy makers, land managers, extension, and agricultural industry representatives. These target audiences were reached through presentations delivered at American Geophysical Union 2023 annual meeting in San Francisco, and the National Association of Landscape Ecologists North American chapter meeting in Oklahoma City. A presentation of results from this project was also delivered to Deputy Under Secretary for Research, Education, and Economics Dr. Sanah Baig during her visit to the USDA-ARS research unit in Fayetteville, AR. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project has provided numerous opportunities for professional networking at academic conferences and with collaborators on the project. I have also enrolled in online courses on machine learning topics, including Deep Learning and Nueral Network courses from Coursear. I have also had the opportunity to mentor undergraduate researchers in our lab group who have contributed some minor work on the project How have the results been disseminated to communities of interest?Yes, these results have been so far disseminated at national and international academic conferences and in meetings with the research groupwho provided data. What do you plan to do during the next reporting period to accomplish the goals?- Draft first manuscript based on results of objective 1 - Transfer all models to HPC systems - Scale up and apply model atnational level (objectives 2.1 - 2.5)

Impacts
What was accomplished under these goals? Of the goals listed above, so far objective 1 has been completed (including subobjectives 1.1 - 1.5). Data processing and cleaning has been automated and used to develop a dataset to train machine learning models. Dozens of iterations of machine learning models have been compared for accuracy and evaluated based on quantitative error and accuracy metrics.

Publications

Type: Conference Papers and Presentations Status: Other Year Published: 2024 Citation: Smith, H.W., Ashworth, A.J., Bullock, D., Kharel, T.P., Nalley, L.L., Owens, P.R. Understanding the drivers of maize and soybean yields in U.S. agricultural landscapes using machine learning. International Association of Landscape Ecologists, North America Annual Meeting, Oklahoma City, OK, April 2, 2024.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Smith, H.W., Ashworth, A.J., Bullock, D., Kharel, T.P., Nalley, L.L., Owens, P.R. Machine learning to understand and predict soil, climate, and terrain effects on suitability of U.S. maize and soybean crops. American Geophysical Union Annual Meeting, San Francisco, CA, December 14, 2023.