Source: UNIV OF CALIFORNIA submitted to
DSFAS: HARNESSING DATA FOR ACCURATE YIELD AND OIL CONTENT PREDICTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
NEW
Funding Source
Reporting Frequency
Annual
Accession No.
1028284
Grant No.
2022-67022-37021
Project No.
CALW-2021-11538
Proposal No.
2021-11538
Multistate No.
(N/A)
Program Code
A1541
Project Start Date
Jun 15, 2022
Project End Date
Jun 14, 2025
Grant Year
2022
Project Director
Wang, W.
Recipient Organization
UNIV OF CALIFORNIA
(N/A)
LOS ANGELES,CA 90095
Performing Department
Computer Science
Non Technical Summary
The research objective is design, development, and field-testing of an artificially intelligent method to predict the yield and oil content of flax from a number of morphological traits before harvesting. Success in this project will result in a quantum leap for flax breeding programs by bringing in a systematic data-driven autonomous approach, in lieu of conventional heuristic-based decision making. NDSU hosts the only flax breeding program in the US and North Dakota is the largest producer of flax (91% of US production), which will be used as the testbed in this project. The project requires precision data collection on morphological traits throughout the entire life cycle of the crop. A cooperative team of unmanned aerial systems (UASs) and unmanned ground vehicles (UGVs) will be employed. The UGVs will also be loaded with a hyperspectral camera to predict the oil content even before harvesting. The scheduling and operation of the UAS-UGV team will be dictated by a data analytics engine. The videos collected by the UAS/UGV will be processed to extract various morphological traits and to predict the final yield of the crop through an integrated machine learning model. Hyperspectral images will be analyzed using machine learning to predict the oil content of each plot. If successful, this predictive model for yield and oil content will allow a breeder to substantially lower the costs of the breeding program, and hereby improve the quality of the new crop variety. In collaboration with AmeriFlax, this framework will be tested in real-world settings.
Animal Health Component
0%
Research Effort Categories
Basic
50%
Applied
20%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2041842202050%
2041842106050%
Goals / Objectives
The ultimate goals of this project are (a) yield prediction and (b) oil content analysis before harvesting; the known information in this case are the morphological traits and hyperspectral images at various stages of growth. To accomplish these goals, the project will lead to hardware and software innovations that enable crop monitoring with extreme precision at all stages of growth.(1) Field preparation and data collection using Unmanned Aerial Systems (UASs) and Unmanned Ground Vehicle (UGVs);(2) Optimization of UAS and UGV scheduling;(3) Generation of morphological traits from images;(4) Yield Prediction using machine learning models built on morphological traits;(5) Operating Hyperspectral cameras on UGVs;(6) Generating ground truth data from NMR grain analyzer;(7) Oil content prediction using machine learning models built on hyperspectral images.
Project Methods
The proposed project is divided into three interconnected tasks.Task I will develop the tools (UGVs, UASs) and the testbed (field plots) to collect agronomic data.In this task, we will establish the groundwork for field trials and data collection. The data will be used in the subsequent tasks for predictive modeling of yield and oil content. This task is divided into three parts:(A) experimental plot preparation: The flax breeding trials will be conducted at Casselton (46°54′0″N 97°12′38″W) and Carrington (47°27′0″N 99°7′26″W) of North Dakota. We will conduct an experiment in the greenhouse at NDSU to fly a drone and to run a robot for data taking and model development. (B) UAS-based data collection: UAS images will be collected from the flax research plots using all or some of the following UAS platform + sensors: DJI Phantom 4 Pro and RTK (real time kinematics) version + 20 MP RGB camera (13.2 x 8.8 mm sensor size); DJI Matrice 210 V2 + 24 MP RGB camera (23.5 x 15.6 mm sensor size); DJI Matrice 200 V2 + MicaSense Altum camera (7.12 x 5.33 mm sensor size); DJI Matrice 600 Pro + 42 MP RGB camera (24 x 35.9 mm sensor size) + PPK (post processed kinematics) receiver. The RGB sensors coupled with different lenses will provide a variety of image resolution and quality, increasing the likelihood of having adequate imagery to extract features of interest to build and train machine learning models.(C) UGV-based data collection: We are building on a UGV, developed by Jawed and Rahman, for precision mapping and data collection. This UGV has been tested for operation in the flax field of North Dakota. It is able to autonomously recharge itself and can function in the field without human intervention for several days.We will apply data analytics to optimize the schedule of UASs and UGVs for optimal scheduling. The objective is to achieve high precision in yield prediction while minimizing the cost of operating UASs and UGVs.Task II will use the data to formulate models to predict yield. The morphological traits in this project are divided into two classes based on the data source: the ones from UASs and the ones collected by UGVs. We plan to develop machine learning models to automatically derive these traits from images taken by the UASs and UGVs. To do so, we will first create training data by manually labeling a small subset of the images. We can consider this problem as a multi-task learning problem where each task is the derivation of a specific trait. Multi-task learning has been demonstrated to achieve better performance than building separate machine learning models for each task, because multi-task learning models are able to leverage interdependencies between tasks. This will be integrated into the prediction model presented next to boost the prediction accuracy and provide explainable features for the yield prediction. We will develop robust and effective machine learning models to enable accurate early prediction of the yield. We plan to learn a few-shot learning model based on model-agnostic meta-learning (MAML) for this purpose. The input data include images (taken by the onboard cameras) and environmental information (taken by the on board sensors) of the UASs and UGVs over time as well as morphological traits derived from them. These data will be fed into a deep neural network model to learn "representation" of the crop over time which can be used to predict the yield. Although deep learning models, such as VGG-Net and Inception-v3/v4, have scored substantial successes in the fields of object identification and representation learning in images, applying these methods to agricultural tasks has been in its infancy, partially due to limited availability of training data. We have already started to collect data on 1,100 plots this year (2021) and expect to have data of a full season to train the machine learning models when the project officially starts next year. To fundamentally enhance the model robustness with limited training data, we propose to apply few-shot learning to leverage knowledge from other tasks using larger datasets.Task III will upgrade the robotic tools for hyperspectral imaging and formulate ML-based models for oil content analysis. Measurement of oil content in oilseed flax is an important parameter for commercial value and germplasm enhancement in breeding programs. The conventional chemical method (Soxhlet wet chemistry) is time consuming, expensive, and seed destruction is needed to complete the analysis. NMR spectrometry is a rapid, nondestructive method, and requires a small amount of seeds, which gives an accurate measurement of oil content of whole seeds. Seed oil content in NMR can be done in a few seconds and the results are highly reproducible. This method has been used in many crops, such as flax, canola, soybeans, and maize. After harvesting the crop, the total seed oil content will be characterized by using an NMR (model MiniSpec MQ-10) from Bruker. Co-PI Rahman has access to this equipment and he regularly uses it for oil content analysis. This will serve as the ground truth to train our machine learning models on hyperspectral images.Unlike with traditional methods including spectroscopy and NMR, the invention of hyperspectral imaging systems has opened doors to remote sensing of various compounds through the detection of reflected light in a non-destructive and high-throughput manner. While traditional methods typically require sample processing to extract particular substances from tissues of known origin, hyperspectral imagery allows for the direct capture of reflectance where image-based analytics can be used to interpret results reducing field sampling and lab processing times. If accurate predictions can be made from hyperspectral data taken from in situ field plants, it would allow evaluations to be conducted prior to harvest, compared with following seed harvest using traditional methods, as well as allow for evaluations to be conducted without need for deconstructive sampling and extensive lab work.Ambient light is an important factor in hyperspectral imaging. However, our UGVs are much smaller in size and imaging in a controlled light environment is a challenge. Our approach is to augment the UGVs with headlights for night-time operation and mount a hyperspectral camera onto the chassis of the UGV.Evaluation: We will evaluate the accuracy and robustness of our machine learning powered models on yield and oil content prediction. We will collect the ground truth data each season to test the performance of our models. Data collected during one year will be add to the training data of the subsequent years to further improve the models. We will also run ablation study to further understand the benefits of each component of our workflow.In the last year of the project, we will attempt to get data from real farmers' fields and test the validity of our models on yield prediction, in collaboration with AmeriFlax.

Progress 06/15/22 to 06/14/23

Outputs
Target Audience:The main target audiences for our project include researchers, plant breeders, industry stakeholders (if applicable), and growers specializing in row crops such as flaxseed, canola, tomato, and strawberry. We intend to make the design and methodologies openly accessible in the public domain. After thorough testing of the technology in research fields, our goal is to facilitate the transfer of this technology to growers' fields for on-farm demonstrations. The aim is to encourage growers to adopt and benefit from the innovative technology in their crop cultivation practices. ? Changes/Problems:In the year 2023, unforeseen natural disasters had a significant impact on our experimental trials. Specifically, at Carrington, the trials had to be abandoned due to the effects of herbicide residuals in the field. The herbicide residual effect posed challenges that led to the decision to discontinue the experimental trials at this location. Additionally, at Casselton, we observed an uneven germination of crops. This irregular germination pattern influenced by uneven moisture introduced variability in the growth and development of the crops at this particular site. In both cases, it is great to have the UAS imagery to help to support those decisions. What opportunities for training and professional development has the project provided?A doctoral candidate within Professor Rahman's research group is being funded through this grant. This student actively participated in tasks related to flying drones and operating robots both in field and greenhouse settings. Their responsibilities included establishing planned experiments at three different locations and employing suitable experimental designs. Additionally, they were responsible for overseeing the harvest of experimental plots and conducting subsequent statistical analyses. A M.S. student in Dr. Flore's lab is being supported through this grant. He is the person heading the efforts related to UAS image collection and the basic image processing (data extraction for individual plots). In addition, that student has assisted Dr. Rahman's students in greenhouse data collection by constructing a rail platform that allows a camera to smoothly move across tables where the plants are growing, improving the quality of the and video collected, which are fundamental to support the work being carried out at UCLA. A Ph.D. student in Computer Science, two Ph.D. students in Mechanical Engineering and 3 undergraduate students in Computer Science at UCLA are being supported to design and develop UGVs and machine learning models using the collected data to extract phenotypes such as stem counts and diameters and to make prediction on yield. These works have become the core elements of their thesis. How have the results been disseminated to communities of interest?Video on practical use of Drone and Robot in NDSU Oilseed Breeding program for high-throughput phenotyping have been made public on YouTube: https://www.youtube.com/watch?v=48V2Y_Hpqxo&t=12s.? What do you plan to do during the next reporting period to accomplish the goals?The experiment is scheduled for repetition in 2024 at three locations in North Dakota, following the identical experimental design employed in both 2022 and 2023. We will complete the analysis of 2023 data and will prepare a manuscript to publish in a peer-reviewed journal. From a UAS image collection perspective, it would be great if we can identify the sensors and times that we should focus on for data collection. That would help us to become more efficient at collecting data, since we could potentially decrease the number of flights, decrease the number of sensors, which in turn would decrease the amount of data collected and processed. In addition, we are currently exploring options that would further increase efficiencies regarding the process and tools that we currently use to draw individual plots on top of the drone image after each flight. That process is currently semi-automated and we looking to increase its level of automation by training a machine learning model to detect the plots based on the drone imagery. At UCLA,we will continue our development of machine learning models for yield prediction using UAS and UGV images. Our methodology encompasses leveraging pre-trained vision models, such as ResNet18 and Vision Transformer, for image encoding. This approach is critical for extracting meaningful features from aerial imagery captured across different plots. Our data processing strategy is bifurcated into two principal components: spatial and temporal information integration. (1) Spatial Information Integration: To address the challenge of capturing comprehensive spatial details of agricultural plots, our dataset includes images shot from varied angles for each plot. This strategy overcomes the limitation of a single image's inability to cover an entire plot effectively. To assimilate this spatial information, we employ contrastive learning techniques. This involves encoding images using the aforementioned pre-trained models to generate representations. These representations are then projected through a feed-forward network, culminating in a loss module designed to minimize intra-plot representational distances while maximizing inter-plot distances. This process ensures that spatially related images contribute cohesively to the plot's representation. (2) Temporal Information Integration: Agricultural plots exhibit significant changes over time, necessitating the incorporation of temporal data for accurate yield prediction. Our dataset, therefore, includes images of each plot captured at regular intervals, specifically during the months of June, July, and August. To integrate this temporal aspect, we initially explore the use of Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models, known for their efficacy in handling sequential data. Additionally, we consider employing Transformer models, adapted from the time series analysis domain, to enhance our temporal prediction capabilities. By combining these spatial and temporal information processing strategies, we aim to develop a robust framework for agricultural yield prediction. This framework not only capitalizes on the strengths of pre-trained vision models for initial image encoding but also innovatively integrates contrastive learning for spatial information processing and explores advanced models for temporal analysis. Our goal is to achieve a comprehensive understanding of each plot's characteristics over time, thereby enabling more accurate and reliable yield predictions.

Impacts
What was accomplished under these goals? Field experiments were carried out in 2023 at various locations in North Dakota, including Fargo, Prosper, Casselton, Carrington, and Langdon. Within the canola breeding nursery, each site featured three distinct trials: 1) Wide Area Trial - consisting of 108 experimental plots; 2) Advanced Yield Trial - comprising 243 experimental plots, and 3) Early Generation Testing - consisting of 486 experimental plots. Regarding the flax breeding program, each location included three trials: 1) Uniform Regional Nursery with 108 plots; 2) Advanced Yield Trial with 162 plots; and 3) Intermediate Yield Trial with 294 plots. Due to the residual effect of herbicides at the trial site in Carrington, the canola breeding trial at the location had to be abandoned. In 2023, we expanded the scope of unmanned aerial systems (UAS) flights by increasing the number of locations and flight frequency. Flights were carried out with different UASs and cameras to try to capture images that would benefit the goals of the projects. The main aircraft used for the flights was a DJI Matrice 300 aircraft, to which the following sensors were attached, one at the time, to collect images of the trials: DJI Zenmuse P1, Altum-PT, and RedEdge-X Dual Camera System. To check the suitability of a cheaper solution to collect the same type of data, a DJI Mavic 3 Multispectral was used as well. All flights were carried out at 100 ft above ground level (AGL). There were 8 flights over canola and flax trials across all three locations, covering the entire crops growing season. As in previous years, all the UAS collected data was processed by Dr. Flores' team, since they have the resources and expertise to process the data. In summary, the data from the cameras are transferred to a desktop computer, the images are stitched into orthomosaics using Pix4DMapper software from Pix4D, a shapefile containing the IDs of each experimental plot is created, and in-house developed tools are used in ArcGIS Pro to calculate a variety of vegetation indices and extract statistics related to those indices (mean, median, min, max, standard deviation, and range) for each single plot, which then are saved into an Excel file for further analysis. Currently, we are in the process of correlating vegetation indices with agronomic traits across all locations to identify both the best vegetation indices related to traits of interest and the best sensor to be utilized to collect the data. In 2023, the small robot was also employed for phenotyping stem diameter both in the field and greenhouse. The UCLA teamisdeveloping algorithms to convert image data to digital data for precise stem diameter measurements. Additionally, a large robot (5ft long x 5ft wide x 4ft high) has been constructed for whole plot weed control and phenotyping other agronomic traits. We've successfully developed and tested a machine learning-driven computer vision system that accurately measures the stem diameter of canola plants. This system utilizes sophisticated techniques such as semantic segmentation, keypoint detection, and depth estimation, along with various other image processing methods, to precisely determine stem sizes. Currently, this technology is undergoing thorough testing with extensive datasets to ensure its accuracy and reliability before it is shared with relevant communities and stakeholders. In collaboration with Dr. Flores, we conducted four UAV flights (51, 66, 72, and 77 days after seeding) over 675 CANOLA field plots at Carrington during four different growth stages. The images were stitched into ortho-mosaics using Pix4DMapper from Pix4D, then we used an in-house developed Python script to calculate and extract single plot statistics for 38 vegetation indices (VIs) at Dr. Flores's laboratory at NDSU. Pearson's correlations between these indices and seed yield revealed a significant association (0.74****), particularly with indices such as NDVI, ENDVI, VEG, GRRI, NGRDI, MGRVI, VDVI, VARI, SAVI, and OSAVI at the 66-day after seeding canola. Similarly, the UAV flights were conducted over 555 FLAX field plots at Carrington across five different growth stages. Pearson's correlations between these indices and seed yield revealed a significant association (0.70****) with several vegetation indices. This outcome suggests the potential utilization of UAV multispectral images as a proxy for predicting seed yield.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Rahman M, Flores P, Hosain S, Jony M, Hasan F (2023) Accelerating Breeding Efficiency by Applying High-Throughput Phenotyping and Genomic Prediction Methods in Oilseed crops. An abstract for 16th International Rapeseed Congress, September 24-27, 2023, Sydney.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Delavarpour N, Mathew J, Aduteye J, Flores P. (2023) A comparative study on deep learning algorithms performance on flax crop boll-counting from crop RGB images. 2023 ASABE Annual International Meeting.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Rahman M, Flores P, Hosain S, Jony M, Hasan F, Jawed K, Wang W. (2023) Accelerating Breeding Efficiency by Applying High-Throughput Phenotyping and Genomic Prediction Methods in Canola. ASA, CSSA, SSSA International Annual Meeting, 2023, Sydney, Australia.