DSFAS: Big Data-enabled Real-time Learning and Decision Making for Field-based High Throughput Plant Phenotyping

DSFAS: BIG DATA-ENABLED REAL-TIME LEARNING AND DECISION MAKING FOR FIELD-BASED HIGH THROUGHPUT PLANT PHENOTYPING

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

ACTIVE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1030625

Grant No.

2020-67021-40004

Cumulative Award Amt.

$478,239.39

Proposal No.

2022-11642

Multistate No.

(N/A)

Project Start Date

Nov 1, 2022

Project End Date

Oct 31, 2025

Grant Year

2023

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
CLEMSON UNIVERSITY
(N/A)
CLEMSON,SC 29634

Performing Department
(N/A)

Non Technical Summary
Contemporary agriculture faces several new challenges imposed by environmental factors such as scarcity of arable land and climate change. Accelerated crop improvements and significant advancements in understanding of the plant's response to stresses are needed to meet the global food demand and to cope with the predicted dramatic changes in climate conditions and to ensure environmental sustainability. Repeated and quick measurement of crop phenotypic parameters is a major bottleneck in plant breeding programs and for closing the gap between genomics data and phenotype, and high-throughput phenotyping (HTP) technologies have been proposed to address this issue. Automated data collection for HTP and detailed data management and processing have recently proven their benefits in obtaining phenotypic information. However, in existing platforms, data collected from the field are stored locally and later transferred and processed offline. To tap the full potentials of phenotyping, multi-scale variety (plot/plant/field), heterogeneous and large volume data should be collected by various static and mobile sensors (e.g., robotic devices) connected through Internet of Things (IoT) technology. Although large datasets can be useful for phenotyping, they raise several challenges, e.g., combining data from various sensors/sources, provenance, contextualization, data management, storage, extracting features and visualization, which overall make it a big data problem. It is then critical to develop a new platform to collect data, handle real-time streams, analyze and manage such large datasets for HTP applications, which this research intends to do. The proposed research is important to the general public because it will: (1) satisfy and enhance human food and fiber needs by improving crop yield through the efficient use of HTP technology; and (2) sustain the economic viability of farm operations. The efficient and high performance IoT-enabled, big data cyber-physical system of this research suits agriculture and food industry needs.To meet such unprecedented needs, aiming at improving crop quality and yield, this project offers a compelling research plan tobuild an autonomous platform utilizing smart sensors and robotic devices connected through IoT technology, as well as a suite of new analytical tools to collect, manage and analyze large datasets in order to study the morphological, physiological and pathological traits without causing damage to the plants. This data can be potentially used in combination with environmental and genotypic data to make breeding decisions, to uncover relationships between genotypes and phenotypes and for automated monitoring of plant health status to reduce qualitative and quantitative losses during crop production. The ultimate goals of the project will be in: (1) facilitating real-time decision making for an improved field-based plants phenotyping; and (2) developing open-source data analytic platforms to improve affordability, penetration and adoption of AI technologies among the stakeholders, and most importantly farmers (resulting in societal benefits). If those goals are met, the general impact would be to inspire how the intersection of big data analytics and IoT-enabled databases can transform farm operations and farm management, as well as HTP. It will also open up new avenues to utilize novel (and emerging) data-driven approaches in agricultural processes. Another significant impact of this project is the capability it will create to share curated and labeled phenotypic data with the scientific community.

Animal Health Component

30%

Research Effort Categories

Basic

40%

Applied

30%

Developmental

30%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
402	1719	2020	100%

Knowledge Area
402 - Engineering Systems and Equipment;

Subject Of Investigation
1719 - Cotton, other;

Field Of Science
2020 - Engineering;

Keywords

high throughput phenotyping

real-time analytic and decision making

internet of agricultural things

Goals / Objectives
The overarching goal of the proposed research is the design and implementation of a scalable high-throughput plant phenotyping (HTPP) system by harnessing potentials of big data analytics and real-time decision making. The specific objectives of this project are as follows: (1) design and validate efficient data acquisition systems and algorithms for plant phenotypic trait measurements using static sensors, as well as mobile robots equipped with advanced sensors; (2) develop data processing, reduction, storage, and real-time analytic algorithms using distributed computing tools (on the fogs) in order to only transfer useful data; (3) design computationally fast (and real-time) deep learning-based algorithms by exploiting big data storage and distributed processing tools to extract/analyze phenotypic traits from diverse sources of data; (4) design an interface to visualize real-time and historical data analysis results and spatio-temporal data; (5) implement and validate the big data pipeline and its multiple components for two case studies: cotton bloom detection/counting and water stress analysis. While the former is addressed using state-of-the-art multi-object detection techniques combined with color segmentation and image transformation methods, the latter will use deep learning-based classification methods.

Project Methods
Autonomous ground vehicles built in-house will be used to collect high velocity and accurate phenotypic cotton plant data. The sensing modalities of interest to the research team include conventional RGB, thermal, spectral, and 3D imaging. The sensors are connected through Internet-of-Things (IoT) technology to transfer, process and contextualize data. Data contextualization will be aided by the GPS technology and inertial sensors mounted on the ground vehicles. To address data contextualization, a data model (in terms of structure and organization) will be developed to define metadata and format of the data. The data model is defined in four layers, namely, user, experiment, plot/field, and sensor. A big data ecosystem will be developed that supports both batch and real-time computations. Data storage and analytics will be managed using decentralized digital objects and semantics. The data will be stored in a big data ecosystem using efficient data ingestion techniques. The data pre-processing will be done to treat/curate and prepare data to make it suitable and aligned with data analytic methods. The pre-processed data would then be used by data analytic algorithms to eventually measure phenotypic traits. For the purpose of developing a fast learning method that can be deployed in the field, single-stage object detection methods are used that, by taking an input image, can learn the class probabilities and bounding box coordinates both at once. Single-stage methods will be used for cotton flower detection, localization and counting in images acquired by our ground robotic platforms. To achieve a higher accuracy for detection and classification, color segmentation and image transformation will be employed. Furthermore, a Python-based webserver will be developed for computer and mobile devices. The data-driven learning models will be deployed to the webserver, where the user can create experiment(s) and see the results through a graphical user interface (GUI) dashboard. Some of the contextualization of the data based on data model would be added through this interface. Finally, two interrelated case studies (water stress detection and bloom localization/counting) will be used to demonstrate big data-driven decision making in the proposed framework. (1) Cotton bloom localization, counting and tracking will be performed using a modified CottoTrack model with transfer learning and a single-stage detection method. Furthermore, color segmentation will be used to classify blooms missed and to employ edge detection and determine good features to track using homography transformation. A multi-modality, RGB and stereoscopic depth image, real-time cotton bloom detection system will also be developed, in which disparity map and color images will be fed into the bloom localization and tracking model. (2) Multi-modal deep learning models and fusion approaches will be developed for water stress detection of cotton plants from heterogeneous ground images (and in particular, multispectral and thermal images) taken of the plants and soil, as well as historical data on plant growth rates. The goal here would be to demonstrate that learning and data fusion of deep residual neural network (ResNet) models from heterogeneous data sources (numerical, categorical and image) will outperform conventional singlemodal approaches in detecting thermal stress.To evaluate the success of the tools and big data ecosystem developed in this research, various measures will be employed.First set of measures are associated with evaluating and comparing the performance of the proposed ensemble learning and classification approaches (in terms of accuracy and speed) for cotton blooms detection, localization and tracking, as well as water stress detection. Second set of measures will be used to quantify the results of testing at different stages/layers of the proposed big data pipeline. This is to ensure that data is being transmitted and processed without any errors and to evaluate the efficacy of the proposed pipeline.

Progress 11/01/23 to 10/31/24

Outputs
Target Audience:The target audience of this project includes stakeholders (and in particular, practicing engineers, farming industry and cotton breeders), as well as broader research community including graduate students and scientists working on the intersection of data science and plant sciences. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?At UGA main campus, one PhD student worked on this project under joint supervision of the PDs. Ms. Vaishnavi Thesma, a PhD student since 2022, co-advised by PD Velni and co-PD Rains has worked in PI Velni's lab since 2019 first as an undergraduate researcher and then as an MS student (2020-2022) and now as a PhD student in Electrical & Computer Engineering. At Clemson, Mr. Yalun Jiang (Mechanical Engineering) is adding a new dimension to this project (on the design and prototyping of a device to be used for automatic harvesting) and interacted closely with the other graduate students involved.At UGA Tifton campus, a PhD student (Mr. Peter Ngimbwa) in Agriculture Engineering has been working on a graph database development for field data organization and relationships (graph neural networks to analyze and make predictions from data in real-time). The work of the three named students resulted in two published papers (1 conference and 1 journal) with several new papers that are either in the pipeline or under review. How have the results been disseminated to communities of interest?Data collected, as well as the outcomes of the project are often shared with Cotton, Inc, the project industry partner. In fact, work plan and data results are shared with Cotton Inc. research staff as part of this project and another related cotton harvesting project; Cotton Inc. research staff involved in the meetings have been providing feedback on the results and directions for future activities. Our team alsopresented findings of this work on using algorithms we developed to estimate cotton growth at the ASABE Conference (refer to the Product section for citation) and Beltwide Cotton Conferences. What do you plan to do during the next reporting period to accomplish the goals?(i) Finalize the integration of the current year's data and all image data into both the Neo4j graph database and the MongoDB document database. (ii) Complete the development of graph neural network models for real-time analysis and prediction of cotton yield and growth patterns. (iii) Conduct a comparative analysis of document and graph databases to evaluate their efficiency in managing data for cotton production. (iv)To support the prototyped soft gripper, we will further enhance the system by developing a vision-based position control, a vision-based force sensing and model-based control, and a cutting mechanism to aid with picking the crop. (v) Our team has started looking intothe problem of cotton node count prediction and forecasting, for which we intend to use multivariate long short-term memory (LSTM) models to eventually create confidence intervals of estimated average node count based on predicted uncertainty. Outcomes of the research topics described above will be presented in terms of severalconferenceand journal papers.

Impacts
What was accomplished under these goals? (i) Developed a low-cost, open-source, distributed computing architecture using Hadoop and deep learning for spatio-temporal mapping of cotton bloom appearance for an entire season; in particular, our method employed a cluster of Raspberry Pi's in a primary-replica architecture to ingest batches of cotton image data, preprocess them, perform cotton bloom detection, and create spatio-temporal maps in parallel. The cluster demonstrated improved performance in terms of faster computational time in comparison to a single, centralized node with the same accuracy. (ii) Collected comprehensive cotton phenotypic data, (height, chlorophyll, soil moisture, flowers, bolls, yield, quality) which we are currently analyzing. (iii) Developed a graph database (Neo4j) and document database (MongoDB) for efficient data storage and analysis, andconducted an experiment to determine the optimal position for the top-mounted camera on the rover, ensuring height and other cotton measurements are as accurate as manual ones. (iv) Created an algorithm using coordinate transformation techniques and flagged plants' GPS to precisely determine their position in captured images, enabling accurate plant growth estimation. (v) Designed and prototyped a soft gripper to allow interacting with the crop with the goal of assessing the maturity of the crop for harvesting.

Publications

Type: Other Journal Articles Status: Published Year Published: 2024 Citation: Thesma, Vaishnavi, Glen C. Rains, and Javad Mohammadpour Velni. 2024. "Development of a Low-Cost Distributed Computing Pipeline for High-Throughput Cotton Phenotyping" Sensors 24, no. 3: 970. https://doi.org/10.3390/s24030970
Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Ngimbwa, P.C., Mwitta, C., Porter, W., Virk, S., Velni, J., and Rains, G., 2024. Harnessing stereo vision systems on a multipurpose intelligent ground rover for precision cotton growth monitoring. ASABE Annual International Meeting, Anaheim, CA, July 28 - 31, 2024.

Progress 11/01/22 to 10/31/23

Outputs
Target Audience:The target audience of this project includes stakeholders (and in particular, practicing engineers, farming industry and cotton breeders), as well as broader research community including graduate students and scientists working on the intersection of data science and plant sciences. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?At UGA main campus, two graduate students (both female) worked on this project. First student (Ms. Vaishnavi Thesma, a PhD student since 2022) co-advised by PI Velni and co-PD Rains has worked in PI Velni's lab since 2019 first as an undergraduate researcher and then as an MS student (2020-2022) and now as a PhD studentin Electrical & Computer Engineering. Second student, Ms. Amanda Issacworked in PI Velni's lab since 2020 first as an undergraduate researcher andas of May 2021as an MS student in the UGA AI Institute. Ms. Issac successfully defended her MS thesis in April 2023. At Clemson, Mr. Alireza Ebrahimi worked on the FACT project activities and interactedclosely with the other graduate students involved.The work of the above students resulted in three conference papers and two journal papers all published during the past reporting period.At UGA Tifton campus, a PhD student (from under-represented groups) in Agriculture Engineering has been working mostly on building hardware systems for data collection and managing the data collection campaign. How have the results been disseminated to communities of interest?Data collected, as well as the outcomes of the project are often shared with Cotton, Inc, the project industry partner. In fact, work plan and data results are shared with Cotton Inc. research staff as part of this project and another related cotton harvesting project; Cotton Inc. research staff involved in the meetings have been providing feedback on the results and directions for future activities. Furthermore, our team has been holding meetings with Microsoft Azure group to discuss opportunities for Microsoft to share resources and expertise forimproving our big data pipeline. What do you plan to do during the next reporting period to accomplish the goals?We plan to developa low-cost, open-source, scalable, distributed computing pipeline to process vast amounts cotton image data in parallel using Raspberry Pi, Apache Hadoop, and deep learning algorithms. Specifically, we will use Hadoop to create a "leader and follower"distributed computing architecture to ingest batches of cotton image data, preprocess them, detect and count the number of blooms per image using our tiny-YOLOmodels, and create a spatio-temporal map for each data collection day in parallel and in a distributed fashion. Moreover, we will create spatio-temporal maps of cotton blooms appearance over time for a plot across several weeks of the 2023 growing season. These maps will eventually serve as a visual understanding of the spatial distribution of bloom appearance prior to harvesting. This way, farmers will have insight into the quantity of blooms that grow at what frequency over time for an entire field throughout the season.

Impacts
What was accomplished under these goals? In 2022 harvest, we determined if the small rover used to take data and driven between rows of cotton had an effect on the production of cotton between those rows. The small red rover was driven between the middle two rows of 4 row plots 1-2 per week from emergence to just before harvest. To assess the effect on yield between the middle 2 rows, 3 30' plots were selected and hand harvested. There were 3 yields per 30' row; yield between row 1-2, yield between row 2-3 and yield between row 3-4. A Tukeys HSD means comparisons was conducted to determine if yield between rows 2-3 (where the rover had been driven) was significantly different than between rows 1-2, and 3-4. There was no significant difference that could be assigned to rover effect.In 2022, we also conducted a study to harvest cotton early and compare to later harvested cotton. This was done to assess if a harvester used to harvest cotton 2 or more times per season would improve quality over cotton harvested at the end of the season only once through. The 2022 cotton results indicated that early harvested cotton had a higher micronaIre, length uniformity, and better color. In 2023, we planted 16 plots of cotton with 4 treatments and 4 reps each treatment. Each treatment is 4 rows and 30 feet long. Treatments were irrigated x no pesticide, irrigated x pesticide, not irrigated x no pesticide and not irrigated x pesticide. Eight plants per plot (128 total) were measured for height, nodes, and # of bolls throughout the season. Soil moisture and SPAD readings were also collected. Two rain gauges were also monitored daily and used to adjust irrigation of the irrigated cotton.All cotton plots were monitored using an autonomous rover with three stereo cameras, two wheel encoders, GNSS receiver and two IMU's. Two cameras were used to assess plant height and node/boll/flower development. The third camera was used for navigation. Plot measurements were made 2x per week from emergence until first cotton defoliation. Finally, we harvested plots by hand and by machine at 3 different dates. Data is currently being collected and analyzed. Unginned cotton was weighed for each plot and samples sent to cotton classification lab for quality assessment. When working with large datasets, we require algorithms operating in parallel. In 2023, we built a preliminary Lambda Architecture to create two paths for data flow: batch layer and speed layer. We utilized Microsoft Azure to implement the Lambda Architecture pipeline for our scalable HTP system. Azure allowed us to ingest data from IoT devices and the cloud. The overall architecture for the Lambda Architecture in Azure was as follows: ingesting data in the stream layer using Microsoft Event Hubs, and the Azure Data Factory was used for ingestation in the batch layer. The stream and batch layer connects into the Azure Data Lake storage, which later sent data to Azure Databricks. Azure Databricks manages Spark and is an alternative method to using Hadoop (as initially suggested in the proposal) as it can also be used to analyze data. Compared to Hadoop, Databricks has more autoscaling which makes it easier to use. It is also known to be better in terms of scalability and reliability for all data types. The pipeline was then connected to AI models and APIs. We also utilized Azure Machine Learning studio in the pipeline server layer for processing when needing to train deep learning models. Azure Machine Learning studio also has the capability to connect to annotation labelers for real time model training. Previously, we worked with Confluent and used an AWS S3 bucket to take advantage of the source connector in Kafka for thestream layer. We set up a Producer and needed to set up a consumer. However, we decided to incorporate the alternative approach of Event Hubs with Azure for building the overall pipeline as it is also more cohesive. Event Hubs use "consumer groups" which allows multiple applications to independently stream in their own pace. Event hubs are comprised of one or more partitions. Each event (notification) is written with one partition. We also have to create consumer and consumer groups, where a partition is consumed by more than one consumer. Event Hubs has consumer groups which allows numerous consuming applications to be able to read the stream independently. We used data from Azure Event Hubs tolater write into Azure Blob storage. The event hub was set to write messages into a given storage container. The tools mentioned allowed us to build a pipeline for real time analysis which is scalable and reliable. We also developed an approach to perform binary semantic segmentation on Arabidopsis thaliana root images for plant root phenotyping while using a conditional generative adversarial network (cGAN) to address pixel-wise class imbalance. Specifically, we used Pix2PixHD, an image-to-image translation cGAN, to generate realistic and high resolution images of plant roots and annotations similar to the original dataset. Furthermore, we used our trained cGAN to triple the size of our original root dataset to reduce pixel-wise class imbalance. We then fed both the original and generated datasets into SegNet to semantically segment the root pixels from the background. Furthermore, we postprocessed our segmentation results to close small, apparent gaps along the main and lateral roots. Lastly, we presented a comparison of our binary semantic segmentation approach with the available methods in the open literature. Our efforts demonstrated that cGAN can produce realistic and high resolution root images, reduce pixel-wise class imbalance, and our segmentation model yielded high training accuracy (of over 99%), low cross entropy error (of less than 2%), high Dice Score (of near 0.80), and low inference time for near real-time processing.

Publications

Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: R.O. Zanone, T. Liu, and J. Mohammadpour Velni, "A drone-based prototype design and testing for under-The-canopy imaging and onboard data analytics," in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022 (IFAC PapersOnLine 55-32 (2022) 171-176).
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: V. Thesma, C. Mwitta, G. Rains, and J. Mohammadpour Velni, Spatio-temporal mapping of cotton blooms appearance using deep learning, in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022 (IFAC PapersOnLine 55-32 (2022) 36-41).
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: A. Issac, H. Yadav, G. Rains, and J. Mohammadpour Velni, Dimensionality reduction of high-throughput phenotyping data in cotton fields, in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022 (IFAC PapersOnLine 55-32 (2022) 153-158).
Type: Journal Articles Status: Published Year Published: 2023 Citation: A. Issac, A. Ebrahimi, J. Mohammadpour Velni, and G. Rains, "Development and deployment of a big data pipeline for field-based high-throughput cotton phenotyping data," Smart Agricultural Technology, Volume 5, October 2023, 100265 (https://doi.org/10.1016/j.atech.2023.100265).
Type: Journal Articles Status: Published Year Published: 2022 Citation: V. Thesma, and J. Mohammadpour Velni, "Plant root phenotyping using deep conditional GANs and binary semantic segmentation," Sensors 23.1, 2022, 309 (https://doi.org/10.3390/s23010309).
Type: Theses/Dissertations Status: Published Year Published: 2023 Citation: Amanda Issac, "AI-Enabled Big Data Pipeline for Plant Phenotyping and Application in Cotton Bloom Detection and Counting," MS thesis in Computer Science, University of Georgia, April 2023.