Source: UNIVERSITY OF FLORIDA submitted to NRP
DSFAS: HARNESSING PHENOMICS BIG DATA FOR NEAR-REAL TIME DECISION SUPPORT TO IMPROVE ELITE COTTON IDEOTYPES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1031110
Grant No.
2023-67021-40646
Cumulative Award Amt.
$649,189.00
Proposal No.
2022-11591
Multistate No.
(N/A)
Project Start Date
Sep 1, 2023
Project End Date
Aug 31, 2026
Grant Year
2023
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
UNIVERSITY OF FLORIDA
G022 MCCARTY HALL
GAINESVILLE,FL 32611
Performing Department
(N/A)
Non Technical Summary
Information on cotton crop development such as maturity, fruit retention patterns, and plant architecture is limited to non-existent for most breeding and official variety trial programs in the U.S. due to the infeasibility of time-consuming manual measurements. The automated phenotyping technologies and resulting big data would empower cotton breeders to make greater genetic gains and shorten selection cycles. To address the challenges for researchers to effectively manage and efficiently process phenomics data, the overall goal of this project is to develop data science tools and AI-based models to facilitate breeding cotton germplasm with improved ideotypes. Specific objectives of the proposed project are to: 1) manage heterogeneous field data by developing data curation, storage, and sharing workflows; 2) develop real-time video tracking and 3D plant part segmentation deep learning models to measure cotton organ-level high resolution phenotypes; and 3) validate and leverage phenomics data for ideotype breeding to maximize cotton yield. Significant improvement on breeding efficiency would make U.S. cotton industry more sustainable and competitive in the global market. The methodologies developed in this project will also benefit data driven decision support systems for other crops and make the AI-based tools more broadly available to crop breeders and stakeholders. This standard research project is related to the Program Area "Data Science for Food and Agricultural Systems (DSFAS)" and addresses the AFRI priority area of "Plant health and production and plant products". In particular, this project focuses on AI and machine learning for monitoring, analytics, and automation in crop development.
Animal Health Component
100%
Research Effort Categories
Basic
0%
Applied
100%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
40417191081100%
Knowledge Area
404 - Instrumentation and Control Systems;

Subject Of Investigation
1719 - Cotton, other;

Field Of Science
1081 - Breeding;
Goals / Objectives
Our long-term goal is to develop data science tools and AI-based models to facilitate breeding cotton germplasm with improved ideotypes. We will develop a workflow and data management framework to collect, curate, store, and share the large volume, multi-source heterogeneous data following the FAIR (Findable, Accessible, Interoperable, and Reusable) principle. Focusing on measuring crop growth and yield-related traits with high-resolution imagery, we will develop video tracking and 3D deep learning models for cotton plant organ mapping and architectural traits relating to yield components, thereby providing an alternative path to increasing yield to complement traditional breeding methods. Specifically, in this proposed project we seek to: 1) Manage heterogeneous field data by developing data collection, curation, storage, and sharing workflows. 2) Develop real-time video tracking and 3D plant part segmentation deep learning models for plant organ-level high-resolution phenotyping. 3) Apply phenomics data for timely decision support on genotype selection to evaluate segregating populations for ideotype breeding.
Project Methods
Objective 1. Manage multi-scale heterogeneous field data by developing data collection, curation, storage, and sharing workflows. The proposed system is based on a three-tier computing system connecting field, local, and remote computing resources. The system will implement edge computing and FAIR data management principles to handle big data challenges facing modern plant phenomics. In the field, phenotyping systems with edge computing devices can perform not only data collection, but also data reduction, pre-processing, and online inference (Objective 2) using light-weight deep learning models. Pre-processed data along with the meta-data will be uploaded via a wireless network to on-farm server computers that automatically perform data decompression and integrity checks and organize and transfer retrieved raw data to cloud-based storage services. Ultimately, the collected raw data and meta-data will be transferred to cloud-based services for storage, processing, visualization, and sharing. We will leverage cloud computing and deep learning hardware (GPU-enabled computing nodes) from both the HiPerGator super computer and CyVerse for tasks such as CNN and 3D point cloud deep learning model training. The model training will be mainly done using the data collected in prior years, but new data will be collected to fine tune the model on the cloud using continual learning. The trained model will be deployed on the edge computing devices for inference.Objective 2: Plant organ-level high resolution phenotyping through real-time video tracking and 3D deep learning models. Objective 2.1: Cotton flowering habits and yield estimation using online multi-object tracking with inference on the edge. A deep learning based multi-object detection and tracking method from video frames will be deployed on the edge device to count stand, flowers, and mature bolls in real-time (except the detection model training on the cloud). We will take the tracking by detection paradigm which consists of three main steps: detection, motion estimation, and data association. We will develop a deep learning-based optical flow motion estimation model Recurrent All-Pairs Field Transforms (RAFT) to estimate the nonlinear motions of the camera due to disturbances from the uneven terrain. We will adopt the light weight version RAFT-S for online processing. Three video cameras will be mounted on a ground robotic phenotyping platform to record multiple-view (top and two side-view) videos of cotton flowers and bolls, whereas one top-view video camera will be adequate for seedling counting. The three-view videos can be registered based on camera intrinsic and extrinsic parameters to remove duplicated counts from overlapping areas. Tracking performance will be evaluated by the widely used CLEAR MOT Metrics. In addition, the efficiency (in seconds) of each stage of the method, including data preprocessing, detection, motion estimation and data association will be recorded. Objective 2.2: 3D plant mapping through point cloud segmentation using deep learning. Boll mapping and plant architecture (such as node of first fruiting branch, number of nodes to the first fruiting branch, and branching pattern) in cotton can provide detailed insights into the genetic basis of crop maturity, a critical juncture in which the plants transition from its vegetative growth phase to reproductive growth. In this objective, we will acquire high resolution 3D point cloud data for boll detection and plant architecture assessment. The 3D point cloud data will be collected by a terrestrial LiDAR sensor mounted on the ground robot with an RTK GPS to scan the plant from different perspectives. Given that this is an end-of-the-season measurement, three plants from each plot will be cut and moved to the warehouse near the field to be scanned in a batch mode (10 plants without occlusions will be scanned at a time). The computing performs at the three levels: sensing platform, farm gateway computer, and the cloud. We will evaluate the segmentation performance of the deep learning model in terms of IoU, Recall, and Precision. The overall segmentation performance is evaluated through mean IoU (mIOU) and Accuracy. The efficiency of the method is evaluated in terms of average inference time per point cloud. Novel architectural traits will be defined and evaluated by co-PI Chee's lab.Objective 3: Applying phenomics data in a phenotypic selection system towards ideotype breeding. The goal of this objective is to assess the genetic complexity of the traits obtained from plant organ-level high resolution and 3D plant mapping phenotyping described in Obj.2, which will guide cotton breeders to identify specific target traits that are heritable, and therefore amenable to selective breeding to design a preferred cotton ideotype for their production environments. We will leverage the genetic resources and analytical tools developed from a previously funded NIFA project to defray the costs for population development, field evaluation, and SNPs genotyping of the genetic population, while focusing on novel phenotypic traits such as flowering habits, boll positions, and plant architecture. In year 1, we will plant the 500 breeder's selections and 150 "genetic control" lines in the F4 nursery in a single row, repeating block design so that an estimate of plot yield can be obtained, corrected for nearby field variation. We will narrow down the 500 breeder's selection down to 120 lines (on the basis of yield and fiber quality) to be planted for the following two years of field trials. In the third year of the project, the initial F4 population will have made it to the end of the F6 trial (end of year 2 of the funding period), for each genotype we will have obtained image-derived data observed from a total of 6 plots in two location-years of field trials, including 1,440 plots with phenotypes assessed, giving us an accurate picture of trait stability and the distribution of phenotypes in each environment. The broad sense heritability (H2) of all measured traits, calculated from the ratio of total genetic variance to total phenotypic variance, will be estimated by parent-offspring regression (F5 to F6). We will leverage the expertise from Hulse-Kemp's Lab at North Carolina State Univ. to run genomic selection models across generations and different data subsets. Our long-term goal here is to develop a genomic selection model that can predict a genotype's performance (breeding values) with the highest accuracy, thereby allowing breeders to identify genotypes with superior plant ideotypes based on genotype data with a better balance of accuracy and cost/time than phenotyping data alone.

Progress 09/01/23 to 08/31/24

Outputs
Target Audience:The audience of the project includes engineers, data scientists, plant breeders and geneticsts, and industry stakeholders (cotton growers and extension agents). Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?In PI Li's Lab, the grant was used to primarily support one doctoral student Daniel Pettiat the Agricultural and Biological Engineering Department at the University of Florida. In addition, two visiting doctoral students (Chenjiao Tan and Lizhi Jiang) were also partially supported to conduct research on this project at the ABE Department at the University of Florida. In Co-PI Chee's Lab, the funds were used to support one MS student Dalton West. How have the results been disseminated to communities of interest?In addition to publications and conference presentations, we demonstrated our field data collection system to the NIFA director during his visit to UF in January 2024. Li's lab also demonstrated the system for multiple undergraduate student cohorts. What do you plan to do during the next reporting period to accomplish the goals?We plan to collect and analyze the data in year 2 and publish our data in peer-reviewed journals.

Impacts
What was accomplished under these goals? Objective 1. Manage multi-scale heterogeneous field data by developing data collection, curation, storage, and sharing workflows. We have developed MALLARD, a robust data management system designed to streamline the collection, curation, storage, and sharing of multi-scale heterogeneous field data. MALLARD is a single-page web application comprising a Python-based backend, implemented with FastAPI, and a responsive frontend for user interaction. The backend is divided into microservices: the gateway (API management), edge (static file delivery), and transcoder (video processing). The system integrates with MARS-X, an agricultural robotics platform, enabling seamless upload and processing of field-collected data, including automated metadata extraction from EXIF tags and ROSbag files. MALLARD currently supports uploading and managing image and video data through both web-based interfaces and direct robot integration, enhancing accessibility and efficiency for researchers. In the following year, we plan to expand functionality, optimize video transcoding, and enhance data visualization and retrieval capabilities. We have made progress in improving the MARS-X robot platform, focusing on enhancing its data collection and integration capabilities. The platform now supports multi-camera video recording using Raspberry Pi HQ modules, each running ROS nodes to transmit compressed video data to an onboard controller. To streamline data processing, we integrated MARS-X with the MALLARD data management system, enabling automated extraction, transcoding, and metadata generation for video streams. These advancements have improved the robot's ability to efficiently handle and transfer field data, reducing operator workload and increasing data accessibility for research and analysis. Objective 2: Plant organ-level high resolution phenotyping through real-time video tracking and 3D deep learning models.Manual flower counting is valuable to plant breeders, but is often too labor-intensive to be practical. Existing automated approaches are computationally demanding and lack user-friendly interfaces. We made progress in addressing these challenges by developing a self-supervised active learning framework to build a lightweight flower tracking model deployable on ground robots for real-time operation in the first year of the project. Using camera and GPS data, the system also automates flower location extraction. Tested on an Nvidia Jetson Xavier AGX, our approach achieved <10% MAPE in flower counts while maintaining real-time performance, making it a practical and integrated solution for cotton phenotyping. Cotton breeding programs require efficient phenotypic trait measurements to develop high-yield varieties, yet manual methods are time-intensive. We studied a novel 3D Gaussian splatting (3DGS) method for automated measurement using instance segmentation. A 360-degree video of a cotton plant is processed to generate 2D images, estimate camera poses, and create a sparse point cloud with COLMAP. The 3DGS model reconstructs the plant scene, and YOLOv8x generates 2D masks, which are combined with SAGA to segment individual cotton bolls and stems in 3D. Phenotypic traits, including boll count and stem length, are estimated with a mean absolute percentage error (MAPE) of 11.43% and 10.45%, respectively. This approach enhances trait accuracy and offers a novel 3D tool for cotton breeding programs. Objective 3: Applying phenomics data in a phenotypic selection system towards ideotype breeding. In the first year, our goal was to collect phenotype data and the results will be used for ideotype breeding in the late stage of the project.

Publications

  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Tan, Chenjiao; Sun, Jin; Paterson, Andrew H.; Song, Huaibo; Li, Changying. Three-view cotton flower counting through multi-object tracking and RGB-D imagery. Biosystems Engineering, vol. 246, pp. 233-247, 2024.
  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Petti, Daniel; Zhu, Ronghang; Li, Sheng; Li, Changying. Graph Neural Networks for lightweight plant organ tracking. Computers and Electronics in Agriculture, vol. 225, pp. 109294, 2024.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Jiang, Lizhi; Li, Changying; Sun, Jin; Chee, Peng; Fu, Longsheng. Estimation of Cotton Boll Number and Main Stem Length Based on 3D Gaussian Splatting. ASABE, St. Joseph, MI, 2024.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Tan, Chenjiao; Li, Changying; Sun, Jin; Song, Huibo. Multi-Object Tracking for Cotton Boll Counting in Ground Videos Based on Transformer. ASABE, St. Joseph, MI, 2024.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Petti, Daniel J.; Li, Changying. Active Learning for Real-Time Flower Counting with a Ground Mobile Robot. ASABE, St. Joseph, MI, 2024.