DSFAS: DeepYield - Integrating multi-scale sensing, time series, imaging and management data with artificial intelligence for crop yield prediction

DSFAS: DEEPYIELD - INTEGRATING MULTI-SCALE SENSING, TIME SERIES, IMAGING AND MANAGEMENT DATA WITH ARTIFICIAL INTELLIGENCE FOR CROP YIELD PREDICTION

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

ACTIVE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1031579

Grant No.

2024-67021-41534

Cumulative Award Amt.

$591,306.00

Proposal No.

2022-11611

Multistate No.

(N/A)

Project Start Date

Nov 15, 2023

Project End Date

Nov 14, 2026

Grant Year

2024

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
UNIVERSITY OF NEBRASKA
(N/A)
LINCOLN,NE 68583

Performing Department
(N/A)

Non Technical Summary
Farming is a long journey that starts with preparing the land and ends with harvesting crops. There are many steps along the way - planting seeds, fighting off pests and disease, watering, and ensuring plants get the right nutrients. Every farm is distinct, shaped by the decisions of farmers and the inherent factors like soil quality and climate. However, farming today is facing tighter budgets, limited resources, and changing weather patterns. This means farmers need to manage their land even more carefully. They need tools that can help them predict how much crop they'll get based on their farming choices. Right now, that kind of tool is missing. Our study aims to fill that gap. We're building a tool called DeepYield, using the latest Artificial Intelligence (AI) and Deep Learning (DL) techniques. It's designed to predict sugar beet yields, from small experiments to big farms. The tool will gather different formats of data from different sources, make sense of it all, and provide farmers with clear insights about what factors most influence their yields. Plus, it'll be tailored to each farm's unique conditions, rather than offering a one-size-fits-all solution. We're also partnering with Western Sugar Cooperative, a major player in the sugar beet industry across four states (NE, CO, WY, MT). Together, we'll test DeepYield on over 1,000 real-world sugar beet fields. And to make sure the tool reaches farmers, we're creating a user-friendly website on phrec-irrigation.com. By understanding sugar beet farming better and using resources wisely, we believe this project will help farmers improve their methods. While we're focusing on sugar beets now, we think this approach could work for other crops like corn, soybean, and cotton too.

Animal Health Component

50%

Research Effort Categories

Basic

50%

Applied

50%

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
102	2010	2020	25%
102	2010	2080	40%
102	2010	1060	35%

Knowledge Area
102 - Soil, Plant, Water, Nutrient Relationships;

Subject Of Investigation
2010 - Sugar beet;

Field Of Science
2020 - Engineering; 1060 - Biology (whole systems); 2080 - Mathematics and computer sciences;

Keywords

early yield detection

machine learning

multi-modal data fusion

sustainable intensification

Goals / Objectives
The overall goal of this project is to develop a field-specific, artificial intelligence (AI) and deep learning (DL) based yield prediction tool named DeepYield that will consider the holistic process of on-farm crop production by integrating agricultural data that vary greatly in format, sampling frequencies, and scale. Detailed objectives are: 1)Data generation and collection for the development of DeepYield at both research plots and on-farm settings via partnerships with key stakeholders; 2)Development of DeepYield model for accurate, explainable, evolving, and field-specific yield prediction; 3)Evaluation of DeepYield, packaging DeepYield into web-based graphical user interface, and disseminate it to stakeholders.

Project Methods
This study will collect and utilize multi-modal datasets that vary in spatial resolution (from in-situ sensor to Unmanned Aerial Vehicle (UAV) and satellites), sampling frequency (from automated sampling interval in minutes to weekly manual sampling), as well as in data formats (from numeric data to text data). The dependable variables, or prediction variables, include the following: 1) Time-series, point-sourced numerical data, such as: volumetric soil water content at different depths, canopy cover percentages, meteorological data, aboveground dry biomass, crop height, sugar beet yield on DAP x, sucrose content of sugar beet on DAP x; 2) Imagery temporal-spatial data such as overhead sugar beet RBG and spectral images (robot and UAV), satellite-based imagery (NDVI); Management related numerical data such as: plant density, length of crop season, seasonal nitrogen input, and seasonal irrigation input; Management related categorical data such as: variety, tillage, rotation, chemical applications, and soil texture. The prediction variable datasets are designed to complement each other. All data will be collected repeatedly for 3 years.We propose to collect data at three different settings: 1. Research plots (small geographic scale, highest data resolution); 2. Intense-sampling commercial fields (larger geographic scale, medium data resolution); 3. Western Sugar Cooperative (WSC) commercial fields (largest geographic scale, least data resolution). Data collected at research plots and intense-sampling commercial fields will be used to develop and evaluate DeepYield, while data collected at WSC fields (over 1,000 fields with total acreage ~ 110,000 acres) will be used to extensively evaluate DeepYield. This setup allows the development of DeepYield using detailed data while being able to evaluate its robustness and performance using incomplete field data, which is often the case in the real world.After data is collected, the DeepYield AI model will be developed, transferred, and evaluated. The design of DeepYield aims to integrate all datasets proposed earlier while providing excellent explainability as compared to "black-box" AI/DL models. The development of DeepYield will involve following steps: 1) multi-channel feature extraction, 2) within-channel attention mechanism, 3) cross-channel attention mechanism, and 4) fusion for prediction. Here, the channel refers to each type of input dataset. Strategies were also considered to improve model generalizability and robustness under limited sample sizes. Such strategies include: using data augmentation techniques, (2) pre-training the feature extractors for images and time series data based on large public datasets, (3) adopting parameter-efficient designs for the feature extractors such as EfficientNet, and (4) enforcing shared parameters for channels with high-correlated datasets.The end goal of this study is to adopt DeepYield at field level. However, due to the discrepancies in environmental conditions and management practices between research plots and on-farm fields, the model developed at one place may not fit the other. Furthermore, due to practical constraints, some in-season datasets may not be collected as frequently at on-farm fields as in research plots, or may be unavailable/missing in on-farm fields. These constraints prohibit us from learning "one DeepYield model that fits all". To tackle these unique challenges, we propose a novel transfer learning (TL) approach that will adjust and tailor DeepYield to each unique on-farm field based on available datasets from that field. This is similar to "recalibration" or "on-site calibration" processes of many mechanistic crop models but based on data-driven learning. Lastly, to the users (i.e., growers) of DeepYield, two questions can be of great interest: (1) Early prediction, which concerns how early in the growing season an accurate prediction for yield/sucrose can be made and this can help growers with early planning. (2) Continuous improvement of the prediction capability with data accumulation. We propose to use continual learning to address these two questions. The continual learning capability developed above can be used to continuously update the prediction model for each field over time (this project will collect three-year data for each on-farm field). With this capability, we can expect that accuracy, generalizability, and robustness of the model can be continuously improved for field-specific prediction.This project will collect in-season datasets and end-of-season yield/sucrose for 3 consecutive years from research plots and on-farm fields. We will first train DeepYield based on the research plot data in year 1 and use the proposed TL to fine-tune the model using on-farm data in year 1. To evaluate early-prediction capability, we will use the proposed continual learning to progressively update the model with accumulated data in each month, and compare each model with that using all the data in the growing season. For all the models, we will use cross-validation- based prediction accuracy to tune hyper-parameters. We will apply the trained year 1 model to the data in year 2 and 3 (as blind test sets), which will allow us to evaluate the generalization performance of the models. Furthermore, since our data collection will be repeated for 3 years, this provides us an opportunity to continuously improve the models with accumulated data in each field. Specifically, we will use the proposed continual learning to update the aforementioned year 1 models by combining year 1 & 2 data. We will compare these models on year 3 data (as a blind test set).DeepYield will be evaluated in four aspects: (1) accuracy assessment, we will compute the mean absolute prediction error and predictive correlation between the predicted and true yield/sucrose values. (2) explainability assessment. DeepYield will output attention weights, which can be used to locate important time segments, subregions of each field, and in-season variables. The identified time segments or variables will help the team to verify if the findings make sense by comparing them against well-known crop growth mechanistic models/principles. (3) generalizability assessment, we will train DeepYield by including all-but- one on-farm fields and test its performance on the remaining field (treated as a "new" field that our model will be applied to in the future). We will rotate through all fields in this way and assess the generalization capability of the model. (4) computational efficiency assessment, we will report the training and inference times of DeepYield under different computing environments (e.g., HPC vs desktop computer; cloud-based vs. local computing resources). This assessment is important for knowing the computing resource requirement for deploying DeepYield in the future.The project's outcomes will be communicated through both traditional and contemporary channels. These include field days, specialized grower meetings (such as the WSC Joint Research Committee annual meeting), regional and national conferences, dedicated websites, and multimedia content. Notably, we'll develop a web-based graphical user interface (GUI) on UNL's platform (https://phrec-irrigation.com). UNL and GaTech principal investigators will collaborate closely to design the DeepYield web pages, ensuring they're accessible to growers and stakeholders both during and after the growing season. We'll monitor and gather analytics on website usage, assessing engagement annually. Additionally, we'll produce concise YouTube tutorials offering step-by-step guidance on the project's scope and website utilization. Peer-reviewed journal articles and extension publications will also be pursued throughout the project's duration.

Progress 11/15/23 to 11/14/24

Outputs
Target Audience:The target audience encompasses local governmental and industry partners, farmers and communities, and students. Changes/Problems:A key modification to our original research plan involves the GaTech robotic imaging system. Original plan implementation challenge: The deployment of the GaTech robotic imaging system for 3D imagery collection in outdoor settings encountered technical barriers. The system, primarily designed for indoor use, required substantial modifications and field testing for outdoor operations. Due to time constraints and budget limitations in the 2024 growing season, these necessary adaptations could not be completed. Strategic solution and path forward in response to these challenges: we have developed an alternative approach that offers several advantages: We will transition to greenhouse experiments at the GaTech facility, which provides a more controlled environment for the robotic imaging system and is both logistically and economically viable. This modification enables year-round experimentation and data collection, expanding our research capabilities. Progress has already been made in this direction: Seed transfer agreements have been established with KWS Seeds, LLC, Minnesota. Seed shipment from UNL is scheduled for late November/December 2024. Active planning meetings for the 2025 greenhouse experiments are underway. What opportunities for training and professional development has the project provided?The project has provided field-training opportunities to community college students. The content of this project was also incorporated into special courses which provided training for undergraduate students. The project was presented at grower meetings to provide training to growers and industries. How have the results been disseminated to communities of interest? One presentation was made at the annual western sugar research meeting in January 2024. We have provided several fields visits for four community college students from the Western Nebraska Community College (WNCC). The hyperspectral Gaussian Splatting has been submitted to the IEEE Conference of Computer Vision and Pattern recognition 2025. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: Data Collection Expansion For the upcoming 2025 growing season, we will strengthen our data collection efforts through several key initiatives: Research Plot Continuation We will maintain the established experimental protocol at the research plots, collecting the same comprehensive set of parameters to ensure data consistency and build upon our existing findings. Enhanced On-Farm Monitoring We plan to expand our commercial field sampling network by increasing the number of participating farms. This expansion will provide broader geographical coverage and more diverse growing conditions for our analysis. Industry Data Integration We anticipate receiving access to Western Sugar Cooperative's extensive dataset in 2025, which will significantly enhance our understanding of large-scale production patterns. Advanced Imaging Studies A new greenhouse experiment will be established at the Georgia Tech facility, specifically designed to leverage their advanced robotic imaging system. This controlled environment study will enable high-precision data collection and detailed plant phenotyping. Objective 2: Model Development We plan to use the Gaussian Splatting based approaches to more accurately estimate sugar yield in agricultural plots. We plan to collect and use real world data to demonstrate the effectiveness of our approach. Also, we plan to incorporate multi-modal data beyond soil water sensors such as drone/satellite imagery, canopy cover metrics, management data, and (predicted) biomass by other team members. Objective 3: Future Implementation We expect to start evalulation of the model in 2025.

Impacts
What was accomplished under these goals? Objective 1: Data Collection and Management In 2024, the team has focused on comprehensive data collection from three primary sources: research plots, on-farm fields, and Western Sugar Cooperative. The team has also developed techniques to generate data from advanced sensors (e.g. hyperspectral cameras). Here's a detailed breakdown: Research Plot Experiments: At the Panhandle Research, Extension, and Education Center of University of Nebraska-Lincoln, we established 48 experimental plots incorporating 8 irrigation treatments with 6 replicates. The research team collected extensive data spanning management metrics (planting date, population), growth indicators (biomass), environmental measurements (soil moisture content), remote sensing data (UAV-RGB and multispectral imagery), and production metrics (yield and sugar content). On-farm fields: Three commercial sugar beet production fields were selected for intensive sampling. Each field was equipped with a 4-ft soil moisture sensor to monitor soil profile moisture content throughout the growing season. We have gathered satellite RGB imagery throughout the season, while yield and management information will be available upon completion of the ongoing harvest. Industry Collaboration: We maintain active communication with Dr. Rebecca Larson, Vice President for Research at Western Sugar Cooperative, regarding large-scale on-farm data collection. While this data is not available during the current reporting period, the collaboration continues to progress. We developed a 3D Gaussian Splatting based approach for Hyperspectral 3D reconstruction. This approach produces highly accurate renders from novel views and this 3D scene representation can be used for applications for non-destructive nutrient analysis, crop monitoring. Data Management: All collected data has been systematically organized and centralized through the Microsoft 365 platform, ensuring seamless access for all team members. Objective 2: Model Development A preliminary machine learning model using soil water sensor data has been developed to predict the sugar beet productivity. Specifically, we utilized functional principal component analysis (FPCA) and partial least squares regression (PLSR) methods to provide explainability for yield prediction. We also converted the time series data into an image format and used deep learning feature extraction to identify the value of this more sophisticated approach. The approaches can be used to provide water supply recommendations during the cultivation period. Objective 3: Future Implementation As the project is in its initial phase, activities related to Objective 3 have not yet commenced.

Publications

Type: Peer Reviewed Journal Articles Status: Submitted Year Published: 2025 Citation: Sunil Kumar Narayanan, Lingjun Zhao, Yongsheng Chen, Lu Gan, (2024) Hyperspectral Gaussian Splatting, HSGS (CVPR), Submitted under revision.