FACT-AI: Big Data-enabled Real-time Learning and Decision Making for Field-based High Throughput Plant Phenotyping

FACT-AI: BIG DATA-ENABLED REAL-TIME LEARNING AND DECISION MAKING FOR FIELD-BASED HIGH THROUGHPUT PLANT PHENOTYPING

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1023705

Grant No.

2020-67021-32461

Cumulative Award Amt.

$500,000.00

Proposal No.

2019-07509

Multistate No.

(N/A)

Project Start Date

Aug 1, 2020

Project End Date

Jul 31, 2024

Grant Year

2020

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
UNIVERSITY OF GEORGIA
200 D.W. BROOKS DR
ATHENS,GA 30602-5016

Performing Department
(N/A)

Non Technical Summary
Contemporary agriculture faces several new challenges imposed by environmental factors such as scarcity of arable land and climate change. Accelerated crop improvements and significant advancements in understanding of the plant's response to stresses are needed to meet the global food demand and to cope with the predicted dramatic changes in climate conditions and to ensure environmental sustainability. Repeated and quick measurement of crop phenotypic parameters is a major bottleneck in plant breeding programs and for closing the gap between genomics data and phenotype, and high-throughput phenotyping (HTP) technologies have been proposed to address this issue. Automated data collection for HTP and detailed data management and processing have recently proven their benefits in obtaining phenotypic information. However, in existing platforms, data collected from the field are stored locally and later transferred and processed offline. To tap the full potentials of phenotyping, multi-scale variety (plot/plant/field), heterogeneous and large volume data should be collected by various static and mobile sensors (e.g., robotic devices) connected through Internet of Things (IoT) technology. Although large datasets can be useful for phenotyping, they raise several challenges, e.g., combining data from various sensors/sources, provenance, contextualization, data management, storage, extracting features and visualization, which overall make it a big data problem. It is then critical to develop a new platform to collect data, handle real-time streams, analyze and manage such large datasets for HTP applications, which this research intends to do. The proposed research is important to the general public because it will: (1) satisfy and enhance human food and fiber needs by improving crop yield through the efficient use of HTP technology; and (2) sustain the economic viability of farm operations. The efficient and high performance IoT-enabled, big data cyber-physical system of this research suits agriculture and food industry needs.To meet such unprecedented needs, aiming at improving crop quality and yield, this project offers a compelling research plan to build an autonomous platform utilizing smart sensors and robotic devices connected through IoT technology, as well as a suite of new analytical tools to collect, manage and analyze large datasets in order to study the morphological, physiological and pathological traits without causing damage to the plants. This data can be potentially used in combination with environmental and genotypic data to make breeding decisions, to uncover relationships between genotypes and phenotypes and for automated monitoring of plant health status to reduce qualitative and quantitative losses during crop production. The ultimate goals of the project will be in: (1) facilitating real-time decision making for an improved field-based plants phenotyping; and (2) developing open-source data analytic platforms to improve affordability, penetration and adoption of AI technologies among the stakeholders, and most importantly farmers (resulting in societal benefits). If those goals are met, the general impact would be to inspire how the intersection of big data analytics and IoT-enabled databases can transform farm operations and farm management, as well as HTP. It will also open up new avenues to utilize novel (and emerging) data-driven approaches in agricultural processes. Another significant impact of this project is the capability it will create to share curated and labeled phenotypic data with the scientific community.

Animal Health Component

30%

Research Effort Categories

Basic

40%

Applied

30%

Developmental

30%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
402	1719	2020	100%

Knowledge Area
402 - Engineering Systems and Equipment;

Subject Of Investigation
1719 - Cotton, other;

Field Of Science
2020 - Engineering;

Keywords

high throughput phenotyping

real-time analytic and decision making

internet of agricultural things

Goals / Objectives
The overarching goal of the proposed research is the design and implementation of a scalable high-throughput plant phenotyping (HTPP) system byharnessing potentials of big data analytics and real-time decision making. Thespecific objectivesof this project are as follows: (1)design and validate efficient data acquisition systems and algorithms for plant phenotypic traitmeasurements using static sensors, as well as mobile robots equipped with advanced sensors; (2) develop data processing, reduction, storage, and real-time analytic algorithms using distributedcomputing tools (on the fogs) in order to only transfer useful data; (3) design computationally fast (and real-time) deep learning-based algorithms by exploiting big data storage anddistributed processing tools to extract/analyze phenotypic traits from diverse sources of data; (4) design an interface to visualize real-time and historical data analysis results and spatio-temporaldata; (5) implement and validate the big data pipeline and its multiple components for two case studies:cotton bloom detection/counting and water stress analysis. While the former is addressed using state-of-the-art multi-object detection techniquescombined with color segmentation and image transformation methods, the latter will usedeep learning-based classificationmethods.

Project Methods
Autonomous ground vehicles built in-house will be used to collect high velocity and accurate phenotypic cotton plant data. The sensing modalities of interest to the research team include conventional RGB, thermal, spectral, and 3D imaging. The sensors are connected through Internet-of-Things (IoT) technology to transfer, process and contextualize data. Data contextualization will be aided by the GPS technology and inertial sensors mounted on the ground vehicles. To address data contextualization, a data model (in terms of structure and organization) will be developed to define metadata and format of the data. The data model is defined in four layers, namely, user, experiment, plot/field, and sensor. A big data ecosystem will be developed that supports both batch and real-time computations. Data storage and analytics will be managed using decentralized digital objects and semantics. The data will be stored in a big data ecosystem using efficient data ingestion techniques. The data pre-processing will be done to treat/curate and prepare data to make it suitable and aligned with data analytic methods. The pre-processed data would then be used by data analytic algorithms to eventually measure phenotypic traits. For the purpose of developing a fast learning method that can be deployed in the field, single-stage object detection methods are used that, by taking an input image, can learn the class probabilities and bounding box coordinates both at once. Single-stage methods will be used for cotton flower detection, localization and counting in images acquired by our ground robotic platforms. To achieve a higher accuracy for detection and classification, color segmentation and image transformation will be employed. Furthermore, a Python-based webserver will be developed for computer and mobile devices. The data-driven learning models will be deployed to the webserver, where the user can create experiment(s) and see the results through a graphical user interface (GUI) dashboard. Some of the contextualization of the data based on data model would be added through this interface.Finally, two interrelated case studies (water stress detection and bloom localization/counting) will be used to demonstrate big data-driven decision making in the proposed framework. (1) Cotton bloom localization, counting and tracking will be performed using a modified CottoTrack model with transfer learning and a single-stage detection method. Furthermore, color segmentation will be used to classify blooms missed and to employ edge detection and determine good features to track using homography transformation. A multi-modality, RGB and stereoscopic depth image, real-time cotton bloom detection system will also be developed, in which disparity map and color images will be fed into the bloom localization and tracking model. (2) Multi-modal deep learning models and fusion approaches will be developed for water stress detection of cotton plants from heterogeneous ground images (and in particular, multispectral and thermal images) taken of the plants and soil, as well as historical data on plant growth rates. The goal here would be to demonstrate that learning and data fusion of deep residual neural network (ResNet) models from heterogeneous data sources (numerical, categorical and image) will outperform conventional single-modal approaches in detecting thermal stress.To evaluate the success of the tools and big data ecosystem developed in this research, various measures will be employed. First set of measures are associated with evaluating and comparing the performance of the proposed ensemble learning and classification approaches (in terms of accuracy and speed) for cotton blooms detection, localization and tracking, as well as water stress detection. Second set of measures will be used to quantify the results of testing at different stages/layers of the proposed big data pipeline. This is to ensure that data is being transmitted and processed without any errors and to evaluate the efficacy of the proposed pipeline.

Progress 08/01/20 to 10/03/22

Outputs
Target Audience: The target audience included stakeholders (and in particular, practicing engineers, farming industry and cotton breeders), as well as broader research community including graduate students and scientists working on the intersection of data science and plant sciences. Changes/Problems:The project terminated at UGA on August 15, 2022 and will move to Clemson University as soon the request of transferis approved by NIFA. PI Velni remains committed to continue the project at Clemson and successfully deliver on the main promises of the research proposal. What opportunities for training and professional development has the project provided? At UGA main campus, two graduate students (both female), as well as two undergraduate students (1 female) have been working on this project, some of whom received funding from this FACT grant and some supported by other fellowships from the Univ. of Georgia. First MS student, Ms. Vaishnavi Thesma, has worked in PI Velni's lab since 2019 first as an undergraduate researcher, and then as an MS student in Electrical & Computer Engineering. She successfully defended her MS thesis in July 2022 and continued toward her PhD in PI Velni's lab starting August 2022. Similarly, Ms. Amanda Issac has worked in PI Velni's lab since 2020 first as an undergraduate researcher and now (as of May 2021) as an MS student in the UGA AI Institute. Two undergraduate students working on the project included Oliver Zanone and Ms. Himani Yadav. While former student has transitioned to MS program in the PI Velni's lab, the latter who was a UGA Foundation Fellow in Computer Science started full time job in Google as a Data Scientist. Notably, the activities of all four students involved in the FACT project have resulted in conference papers accepted papers and were presented in an IFAC (International Federation of Automatic Control) conference that took place in Sept. 2022.At UGA Tifton campus, a PhD student (from under-represented groups) in Agriculture Engineering has been working mostly on building hardware systems for data collection and managing the data collection campaign. How have the results been disseminated to communities of interest? Data collected, as well as well the outcomes of the project so far have been discussed and presented to representatives from John Deere, CNH and Cotton Inc. In particular, project progress was presented to nine representatives from John Deere's cotton production team who visited University of Georgia, Tifton campus in July 2021. Work plan and data results have also been shared with one CNH representative during visit to Tifton Campus. Quarterly meetings are also used to update Cotton Inc. research staff as part of this project and another related cotton harvesting project. What do you plan to do during the next reporting period to accomplish the goals?The project is terminated at UGA and will move to Clemson University due to the PI moving to Clemson. PI Velni expects to provide a subaward to Dr. Rains (co-PI) at UGA so they can continue to collaborate. Related to the goal (1) above: This year, we added a new stereo camera to the rover to image under the canopy to be able to see more flower and boll development. We will use the 2-cameras system to collect image data with remote-control rover. We have also placed soil temperature and moisture sensors in four plots of the 2-seed per foot treatments. We plan to place four time-lapse cameras over one plot and begin recording images once per hour at first flowering. We will use the data to determine the best method to capture flowering in the field and to compare imaging data to moving rover images.The extra camera under the canopy will be analyzed to identify several growth epochs for cotton during the year, such as nodes above first white flow (NAFWF) and nodes above first cracked boll (NACB). These data are useful for identifying if cotton growth is not optimized for fruit (boll) development instead of vegetation growth. Related to the goal (2) above: When working with large datasets, we require algorithms operating in parallel. We plan to build a Lambda Architecture to create two paths for data flow: batch layer and speed layer. We will utilize Microsoft Azure to implement a Lambda Architecture pipeline for our scalable HTP system. Azure will allow us to ingest data from IoT devices and the cloud. The overall architecture for the Lambda Architecture in Azure will be as follows: ingesting data in the stream layer using Microsoft Event Hubs, and the Azure Data Factory is used for ingestation in the batch layer. The stream and batch layer connects into the Azure Data Lake storage, which later sends data to Azure Databricks. Azure Databricks manages Spark and will be an alternative method to using Hadoop (as was initially suggested in the proposal) as it can also be used to analyze data. Compared to Hadoop, Databricks has more autoscaling which makes it easier to use. It is also known to be better in terms of scalability and reliability for all data types. The pipeline can then be connected to AI models and APIs. We can also utilize Azure Machine Learning studio in the pipeline server layer for processing when needing to train deep learning models. Azure Machine Learning studio also has the capability to connect to annotation labelers for real time model training. Previously, we worked with Confluent and used an AWS S3 bucket to take advantage of the source connector in Kafka for the stream layer. We set up a Producer and needed to set up a consumer. However, we decided to incorporate the alternative approach of Event Hubs with Azure for building the overall pipeline as it is also more cohesive. Event Hubs has the concept of consumer groups which allows multiple applications to independently stream in their own pace. Event hubs are comprised of one or more partitions. Each event (notification) is written with one partition. We also have to create consumer and consumer groups, where a partition is consumed by more than one consumer. Event Hubs has consumer groups which allows numerous consuming applications to be able to read the stream independently. We will have data from Azure Event Hubs to later be written into Azure Blob storage. We will decide how to save the information in the Event Hub to the blob storage. The event hub can be set to write messages into a given storage container. The tools mentioned allows us to build a pipeline for real time analysis that can be scalable and reliable. Related to the goal (3) above: (1) We intend to develop an approach to classify cotton plant leaves as either healthy or nutrient deficient. To achieve that, we will train and test different SVM models on our dataset of leaves from field images collected in 2021 and 2022. We will choose the best performing model to track the progression of nutrient deficient leaf appearance from the same row over time. Our experiments will be used to show prospect in accurately classifying and tracking the appearance discolored leaves from field images. (2) We plan to develop an approach to perform binary semantic segmentation on root images using a conditional generative adversarial network (cGAN) for plant root phenotyping and to address pixel-wise class imbalance. Specifically, we will use image-to-image translation cGANs to generate realistic and high resolution images of plant roots and annotations similar to the original dataset. Furthermore, we will use our trained cGAN to increase the size of our original root dataset. We will then feed both the original and generated datasets into SegNet for binary semantic segmentation. Lastly, we postprocess our segmentation results to close the apparent gaps along the main and lateral roots. We hope to demonstrate that cGAN can produce realistic and high resolution root images with our segmentation model yielding high training accuracy and low cross entropy error. Related to the goal (5) above: Our test case will be concerned about the detection of disease and nutrient deficiency in cotton plants.

Impacts
What was accomplished under these goals? Several tasks were taken related to the specific objectives of this project as described above. Related to the goal (1) above: (1) In 2021, data was collected to obtain the spatial and temporal distribution of cotton plant growth, flowering and cotton boll development, and flower and boll location on the cotton plant. Approximately one acre of cotton was planted on May 11, 2021. Plots were planted using a 4-row planter and 2-rows were skipped between each 4-row pass for ease of walking plots during data collection and harvesting. Each treatment was replicated 4 times. (2) Once per week, ba UAV (3DR Solo) with Go-Pro 4 camera taking images at a rate of 2 images per second at 12 MP was taken at 27m altitude and continues to be taken. The images were stitched together using Mission Planner and Agisoft Software. (3) Two to three times a week, a UGV was driven through the plots and a ZED2 Stereo camera used to collect images at approximately 15 frames per second at a resolution of 720p. The camera is angled 35 degrees below horizontal and both the left and right lens cameras collect images. The built-in IMU sensors for the camera are also recorded. An Emlid RS+ GNSS receiver is mounted on the UGV and an Emlid RS2, dual-frequency receiver is mounted on the edge of field as a basestation. GPS and all other data recorded into a ROS Bag file for each treatment. (4) A data repository was created with access to each of the data files. A total of 24 individual, 9.1m plots (6 treatments x 4 reps) data, were recorded into individual bag files. Secure access to files through sftp has been given to all team members (in Tifton and Athens). (5)Data from 2021 was used to assess treatment response to two planting population, two row spacings, two harvest dates. A total of sixteen thirty-foot plots were planted on May 11, 2021. Soil samples from each plot were also collected in the Fall after harvest and pH, CEC and organic matter tested. The lower 24 inches of each plot was defoliated using a modified sprayer on 9/7/21 and lower portion of defoliated plant hand-harvested on 9/15/21. There was a strong correlation (r2=0.94) between the CEC of the soil and bales per acre hand-picked for the high population planted plots (4 seed per foot). The lower population (2 seeds per foot) Rsquared was only 0.23. On 9/23/21, the top of the cotton was defoliated and the rest of the plots hand-picked on 10/4/21. The correlation between bales per acre hand-picked and CEC was also much lower (Rsquared =0.43). Row spacing was 36 inches (n=16) and 40 inches (n=16). These data illustrate the importance of actual plant and row spacing and soil conditions in predicting crop performance. Data was collected to obtain the spatial and temporal distribution of cotton plant growth, flowering and cotton boll development, and flower and boll location on the cotton plant in 2021. This will be repeated in 1⁄2 acre of cotton planted on June 7, 2022. Plots were planted using a 4-row planter and 4-rows were skipped between each 4-row pass for ease of walking plots during data collection and harvesting. Two planting treatments (two-seed per foot and three-seed per foot) were applied. Each treatment was replicated 10 times. Twice per week, beginning the week of planting, a UAV (Mavic 2 Air) began taking aerial images at a rate of 2 images per second at 20 MP and 24 m altitude. Image collection will continue through harvest. The images were stitched together using Pix4D and Agisoft Metashape Software. One to two times per week, a small remote-controlled rover is driven through the plots and two stereo cameras used to collect images. A ZED2 Stereo camera oriented to take images over the top of the plants at 90 degrees (vertical). An Emlid RS+ GNSS receiver is mounted on the Rover and an Emlid RS2, dual-frequency receiver is mounted on the edge of field as a base station. GPS and all other data recorded into a ROS Bag file for each treatment. Related to goal (2) above: We implemented data reduction methods to reduce the size of image datasets from cotton fields in order to allow for data transfer more quickly over poor internet connections. We investigated dimensionality reduction methods to accomplish thisgoal. Specifically, we used Principal Component Analysis (PCA) to compress image data into a smaller dimension space, which when uncompressed retains significant variability from the original image. To demonstrate the ability of PCA to produce quality reconstructions, we considered the use case of detecting cotton bloom flowering patterns with reconstructed images. We employed Open Source Computer Vision (OpenCV) to generate pixel-wise masks which both further reduces the byte size of data and successfully identifies cotton bloom flowering. The results indicated a high amount of data reduction from the original to the reconstructed images; byte sizes reduced 93% through PCA while preserving around 98% variance when using a much smaller number of components. The results demonstrated great potential in employing machine learning techniques for the data reduction pre-processing step prior to performing subsequent analysis. Related to the goal (3) above: (1) A prototype platform was developed (as a proof of concept) that can automatically collect and analyze images in real-time for leaf counting. To achieve this, a robotic platform was developed capable of navigating between plant rows, capturing the top-view images, and then detecting and counting the number of leaves in real-time. For detection and counting, a Tiny- YOLOv3 model (a light deep learning model) was adapted and trained to accurately count leaves in images acquired with our robotic platform. Using our trained model, a complete list of locations and dimensions of bounding boxes were generated to identify and count the number of leaves in images. Along with the YOLO model, we also provided a comparison with another state-of-the-art object detection method, namely, Faster R-CNN. (2) Our training and testing data sets were released. The images used in the dataset were captured over the course of one month from a group of 60 Arabidopsis plants and taken using a high-quality camera. To obtain the labeled data, the images were taken and then each leaf was labeled with a bounding box representing its location. Transfer learning-based models using Tiny-YOLOv3 were implemented to detect larger, mature leaves of the Arabidopsis plant, without retraining the entire model from scratch. The model was first trained on images with smaller leaves, organized by timestamp. This trained model was then used to detect, localize, and count larger leaves. (3)We developed an approach to create spatio-temporal maps using deep learning to visualize (also related to Goal 4) cotton bloom appearance over time. Specifically, we manually annotated cotton flower image data and train three state-of-the-art fast deep neural network models (namely, YOLOv3, YOLOv4 and R-CNN) to count cotton blooms and their frequency over time prior to harvesting. We used the detection results of the best performing model combined with traditional pixel-based image analysis methods to create a map of where past and future blooms grow on a mid-stage cotton plant. Related to goal (4) above: Current proposed solutions for plant phenotyping using autonomous vehicles include aerial photography via the use of drones and under-the-canopy imaging using ground vehicles. We proposed a hybrid solution to be able to reap the benefits of the maneuverability of drones but the same quality of under-the-canopy images that can be obtained from ground vehicles. The system we built and tested utilized a telescopic arm to deploy a camera into the plant foliage. This technology is vital in detecting early-stage issues such as diseases, bacteria and pest that arise in crop plants. Related to goal (5) above: Our specific case studies for goals 2 and 3 described above were cotton blooms detection and counting.

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: M. Buzzy, V. Thesma, M. Davoodi, and J. Mohammadpour Velni, Real-Time Plant Leaf Counting Using Deep Object Detection Networks, Sensors, 20(23): 6896, 2020. https://doi.org/10.3390/s20236896
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: R.O. Zanone, T. Liu, and J. Mohammadpour Velni, A drone-based prototype design and testing for under-The-canopy imaging and onboard data analytics, in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: V. Thesma, C. Mwitta, G. Rains, and J. Mohammadpour Velni, Spatio-temporal mapping of cotton blooms appearance using deep learning, in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: A. Issac, H. Yadav, G. Rains, and J. Mohammadpour Velni, Dimensionality reduction of high-throughput phenotyping data in cotton fields, in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Theses/Dissertations Status: Published Year Published: 2022 Citation: Vaishnavi Thesma, "Modern Computer Vision Applications for Plant Phenotyping in Agriculture," MS Thesis, School of ECE, University of Georgia, Completed on July 2022.

Progress 08/01/21 to 07/31/22

Outputs
Target Audience:The target audienceincluded stakeholders (and in particular, practicing engineers, farming industry andcotton breeders), as well as broader research community including graduate students and scientists working on theintersection of data science and plant sciences. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?At UGA main campus, twograduate students (both female), as well as two undergraduate students (1 female)have been working on this project, some of whom are receiving funding from thisFACT grant. FirstMS student, Ms. Vaishnavi Thesma, has worked in PI Velni's lab since 2019 first as an undergraduate researcher and currently (starting Fall 2020) as an MS student in Electrical & Computer Engineering. Similarly, Ms. Amanda Issac has worked in PI Velni's lab since 2020 first as anundergraduate researcher and now (as of May 2021) as an MS student in the UGAAI Institute. Two undergraduate students worked on the project included Oliver Zanone and Ms. Himani Yadav. While the former has transitioned to MS program in the PI Velni's lab, the latter who was a UGA Foundation Fellow in Computer Science has moved to Google as a Data Scientist. Notably, the activitiesof all fourstudents involved in the FACT project have resulted in conference papers accepted to present in and publish in proceedings of an IFAC (International Federation of Automatic Control) conference taking place in Oct. 2022. At UGA Tifton campus, a PhD student (fromunder-represented groups) in Agriculture Engineering has been working mostly on building hardware systems for data collection and managing the data collection campaign. How have the results been disseminated to communities of interest?Data collected, as well as the outcomes of the project are often shared with Cotton, Inc, the project industry partner. In fact, work plan and data results are shared with Cotton Inc. research staff as part of this project and another related cotton harvesting project; Cotton Inc. research staff involved in the meetings have been providingfeedback on the results and directions for future activities. What do you plan to do during the next reporting period to accomplish the goals?Related to the goal (1) above: This year, we added a newstereo camera to the rover to image under the canopy to be able to see more flower and boll development. We will use the 2-cameras system to collect image data with remote-control rover.We have also placed soil temperature and moisture sensors in four plots of the 2-seed per foot treatments.We plan to place four time-lapse cameras over one plot and begin recording images once per hour at first flowering. We will use the data to determine the best method to capture flowering in the field and to compare imaging data to moving rover images. The extra camera under the canopy will be analyzed to identify several growth epochs for cotton during the year, such as nodes above first white flow (NAFWF) and nodes above first cracked boll (NACB). These data are useful for identifying if cotton growth is not optimized for fruit (boll) development instead of vegetation growth. Related to the goal (2) above: When working with large datasets, we require algorithms operating in parallel. We plan to build a Lambda Architecture to create two paths for data flow: batch layer and speed layer. We will utilize Microsoft Azure to implement a Lambda Architecture pipeline for our scalable HTP system.Azure will allow us to ingest data from IoT devices and the cloud. The overall architecture for the Lambda Architecture in Azure will be as follows: ingesting data in the stream layer using Microsoft Event Hubs, and the Azure Data Factory is used for ingestation in the batch layer. The stream and batch layer connects into the Azure Data Lake storage, which later sends data to Azure Databricks. Azure Databricks manages Spark and will be an alternative method to using Hadoop (as was initially suggested in the proposal) as it can also be used to analyze data. Compared to Hadoop, Databricks has more autoscaling which makes it easier to use. It is also known to be better in terms of scalability and reliability for all data types. The pipeline can then be connected to AI models and APIs. We can also utilize Azure Machine Learning studio in the pipeline server layer for processing when needing to train deep learning models. Azure Machine Learning studio also has the capability to connect to annotation labelers for real time model training. Previously, we worked with Confluent and used an AWS S3 bucket to take advantage of the source connector in Kafka for the stream layer. We set up a Producer and needed to set up a consumer. However, we decided to incorporate the alternative approach of Event Hubs with Azure for building the overall pipeline as it is also more cohesive. Event Hubs has the concept of consumer groups which allows multiple applications to independently stream in their own pace. Event hubs are comprised of one or more partitions. Each event (notification) is written with one partition. We also have to create consumer and consumer groups, where a partition is consumed by more than one consumer. Event Hubs has consumer groups which allows numerous consuming applications to be able to read the stream independently. We will have data from Azure Event Hubs to later be written into Azure Blob storage. We will decide how to save the information in the Event Hub to the blob storage. The event hub can be set to write messages into a given storage container. The tools mentioned allows us to build a pipeline for real time analysis that can be scalable and reliable. Related to the goal (3) above: (1) We intend to develop an approach to classify cotton plant leaves as either healthy or nutrient deficient. To achieve that, we will train and test different SVM models on our dataset of leaves from field images collected in 2021 and 2022. We will choose the best performing model to track the progression of nutrient deficient leaf appearance from the same row over time. Our experiments will be used to show prospect in accurately classifying and tracking the appearance discolored leaves from field images. (2) We plan todevelopan approach to perform binary semantic segmentation on root images using a conditional generative adversarial network (cGAN) for plant root phenotyping and to address pixel-wise class imbalance. Specifically, we will use image-to-image translation cGANsto generate realistic and high resolution images of plant roots and annotations similar to the original dataset. Furthermore, we will use our trained cGAN to increasethe size of our original root dataset. We will then feed both the original and generated datasets into SegNet for binary semantic segmentation. Lastly, we postprocess our segmentation results to close the apparent gaps along the main and lateral roots. We hope to demonstrate that cGAN can produce realistic and high resolution root images with our segmentation model yielding high training accuracy and low cross entropy error. Related to the goal (5) above: Our test case will be concerned aboutthe detection of disease and nutrient deficiency in cotton plants.

Impacts
What was accomplished under these goals? Several tasks were accomplished related to the major goalsof this project as described above. Related to the goal (1) above: Data from 2021 was used to assess treatment response to two planting population, two row spacings, two harvest dates. A total of sixteen thirty-foot plots were planted on May 11, 2021 with a 4-row John Deere MaxEmerge air-planter. Soil samples from each plot were also collected in the Fall after harvest and pH, CEC and organic matter tested. The lower 24 inches of each plot was defoliated using a modified sprayer on 9/7/21 and lower portion of defoliated plant hand-harvested on 9/15/21. There was a strong correlation (r2=0.94) between the CEC of the soil and bales per acre hand-picked for the high population planted plots (4 seed per foot). The lower population (2 seeds per foot) Rsquared was only 0.23. On 9/23/21, the top of the cotton was defoliated and the rest of the plots hand-picked on 10/4/21. The correlation between bales per acre hand-picked and CEC was also much lower (Rsquared =0.43). Analysis of variance and means testing was performed to determine if bales per acre were significantly affected by plant spacing and population in hand-harvested plots. Row spacing was 36 inches (n=16) and 40 inches (n=16). The population seeding rate was 2 seeds per foot (n=17) and 4 seeds per foot (n=15). Population did not significantly affect the bales per acre at 36 inch row spacing (p=0.175), but did affect it at 40" spacing (p=0.025). However, the low seeded population bales per acre was significantly affected by row-spacing (p=0.0058) but not at the high population (p=0.13). These data illustrate the importance of actual plant and row spacing and soil conditions in predicting crop performance. Stereo camera images from 2021 were recoded as ROS bag files. The files were loaded into the ROS visualization tool Rviz and using the known distance from the camera to the ground, crop height was recorded for each 30-foot treatment. Data from this are still being analyzed to assess the height data changes over the season and how that can be used with yield data to understand crop performance. Data was collected to obtain the spatial and temporal distribution of cotton plant growth, flowering and cotton boll development, and flower and boll location on the cotton plant in 2021. This will be repeated in ½ acre of cotton planted on June 7, 2022. Plots were planted using a 4-row planter and 4-rows were skipped between each 4-row pass for ease of walking plots during data collection and harvesting. Two planting treatments (two-seed per foot and three-seed per foot) were applied. Each treatment was replicated 10 times. Twice per week, beginning the week of planting, a UAV (Mavic 2 Air) began taking aerial images at a rate of 2 images per second at 20 MP and 24 m altitude. Imagecollection will continue through harvest. The images were stitched together using Pix4D and Agisoft Metashape Software. One to two times per week, a small remote-controlled rover is driven through the plots and two stereo cameras used to collect images. A ZED2 Stereo camera oriented to take images over the top of the plants at 90 degrees (vertical). A second camera ZED stereocamera is mounted to take images at zero degrees (horizontally) from approximately six inches from the ground. Images are being collected at 5 frames per second simultaneously and will be used to get a complete view of plant from the top and within the canopy. The built-in IMU sensors for the camera are also recorded. An Emlid RS+ GNSS receiver is mounted on the Rover and an Emlid RS2, dual-frequency receiver is mounted on the edge of field as a base station. GPS and all other data recorded into a ROS Bag file for each treatment. Related to goal (2) above: We implemented data reduction methods to reduce the size of image datasets from cotton fields in order to allow for data transfer more quickly over poor internet connections. We investigated dimensionality reduction methods to accomplish this goal. Specifically, we usedPrincipal Component Analysis (PCA) to compress image data into a smaller dimension space, which when uncompressed retains significant variability from the original image. To demonstrate the ability of PCA to produce quality reconstructions, we considered the use case of detecting cotton bloom flowering patterns with reconstructed images. We employed Open Source Computer Vision (OpenCV) to generate pixel-wise masks which both further reduces the byte size of data and successfully identifies cotton bloom flowering. The results indicated a high amount of data reduction from the original to the reconstructed images; byte sizes reduced 93% through PCA while preserving around 98% variance when using a much smaller number of components. Bitwise masking with OpenCV yieldeda 99% reduction in file size. The results demonstrated great potential in employing machine learning techniques for the data reduction pre-processing step prior to performing subsequent analysis. This data reduction is a crucial step in developing ourfield-based HTP big data pipeline that will be examined in Year 3. Furthermore,we successfully ingested field cotton data (from the year 2021) into the batch layer. This was done by using the Azure Data factory. We first copied data into the Azure Blob Storage. Azure Blob storage is a generic file storage that can be used for both structured and unstructured data. We had to create a storage account, and a resource group for the blob storage. Next, we copied over the data from blob storage into the Azure Data Lake storage Gen2. The next step is to work on the stream layer with Azure Event Hubs. Related to goal (3) above: We developed an approach to create spatio-temporal maps using deep learning to visualize (also related to Goal 4) cotton bloom appearance over time. Specifically, we manually annotated cotton flower image data and train three state-of-the-art fast deep neural network models (namely, YOLOv3, YOLOv4 and R-CNN) to count cotton blooms and their frequency over time prior to harvesting. We used the detection results of the best performing model combined with traditional pixel-based image analysis methods to create a map of where past and future blooms grow on a mid-stage cotton plant. The training results of our best model show a visual understanding of how many cotton flowers grow with high F1 Score of more than 0.95, a true positive rate of 98%, false negative and false positive rates both under 10%, and millisecond-scale inference time for real-time processing. Related to goal (4) above: Current proposed solutions for plant phenotyping using autonomous vehicles include aerial photography via the use of drones and under-the-canopy imaging using ground vehicles. We proposed a hybrid solution to be able to reap the benefits of the maneuverability of drones but the same quality of under-the-canopy images that can be obtained from ground vehicles. The system we built and tested utilized a telescopic arm to deploy a camera into the plant foliage. On-board image processing was utilized to identify plant features with the results visualized on mobile devices. This technology is vital in detecting early stage issues such as diseases, bacteria and pest that arise in crop plants. Related to goal (5) above: Our specific case studies for goals 2 and 3 described above were cotton blooms detection and counting.

Publications

Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: R.O. Zanone, T. Liu, and J. Mohammadpour Velni, A drone-based prototype design and testing for under-The-canopy imaging and onboard data analytics, accepted and to appear in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: V. Thesma, C. Mwitta, G. Rains, and J. Mohammadpour Velni, Spatio-temporal mapping of cotton blooms appearance using deep learning, accepted and to appear in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: A. Issac, H. Yadav, G. Rains, and J. Mohammadpour Velni, Dimensionality reduction of high-throughput phenotyping data in cotton fields, accepted and to appear in Proc. 7th IFAC Conference on Sensing, Control and Automation Technologies for Agriculture (AgriControl), Munich, Germany, Sep. 2022.
Type: Theses/Dissertations Status: Published Year Published: 2022 Citation: Vaishnavi Thesma, "Modern Computer Vision Applications for Plant Phenotyping in Agriculture," MS Thesis, School of ECE, University of Georgia, Completed on July 2022.

Progress 08/01/20 to 07/31/21

Outputs
Target Audience:The target audience in the first year included stakeholders (and in particular, practicing engineers,farming industry and cotton breeders), as well as broader research community including graduate students and scientists working on the intersection of data science and plant sciences. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?At UGA main campus, two graduate students (both female) have been working on this project, both of whom are recipients of other fellowships within the Univ. of Georgia and hence not financially supported from the FACT grant. Ms. Vaishnavi Thesma has worked in PI Velni's lab since 2019 first as an undergraduate researcher and currently (starting Fall 2020) as an MS student in Electrical & Computer Engineering. Similarly, Ms. Amanda Issac has worked in PI Velni's lab since 2020 first as an undergraduate researcher and now (as ofMay 2021) as an MS student in the AI Institute. Ms. Themsa has been given mentorship opportunity to work closely with Ms. Issac on 3-D plant (and root) segmentation. How have the results been disseminated to communities of interest?Data collected, as well as well the outcomes of the project so far have been discussed and presented to representatives from John Deere, CNH and Cotton Inc. In particular, project progress was presented to nine representatives from John Deere's cotton production team who visited University of Georgia, Tifton campus in July 2021. Work plan and data results have also been shared with one CNH representative during visit to Tifton Campus. Quarterly meetings are also used to update Cotton Inc. research staff as part of this project and another related cotton harvesting project. What do you plan to do during the next reporting period to accomplish the goals?Several activities are planned for the next year. Those include: (1) Data collection for 2021 season will be completed and bolls hand harvested at the end of the season.Next year's plots will be planted in May 2022 and data collection will be repeated with added sensors, mobile and static sensing. (2) The data collected from various imagingmodalities will be annotated and analyzed using various deep learning methods (faster R-CNN, YOLO and Tiny-YOLO) and deployed on both drones and UGVs for real-time analytics both on the cloud and on the edge (UGV, drone). (3) We plan to study how different planting treatments affect plant architecture, boll production, location, quality and yield of cotton. Using the models of item (2), measurements of plant growth, growth rate, leaf architecture, boll location, soil properties and plant spectral reflectance will be combined to predict crop production. (4) We will developan interface to visualize point cloud (spatio-temporal), real-time and historical data analysis results.

Impacts
What was accomplished under these goals? Several tasks were taken related to the specific objectives of this project as described above. Related to the goal (1) above: (1) Data was collected to obtain the spatial and temporal distributuion of cotton plant growth, flowering and cotton boll development, and flower and boll location on the cotton plant. Approximately one acre of cotton was planted on May 11, 2021. Plots were planted using a 4-row planter and 2-rows were skipped between each 4-row pass for ease of walking plots during data collection and harvesting. The planter was custom-designed to quickly adjust row-spacing. Two planting treatments (hilldrop vs. single), two row spacings (91 vs. 1.02 cm) and 2 seeding populations (low vs high) were randomly assigned to 9.1m long x 4 row plots. Each treatment was replicated 4 times. (2) Once per week, beginning the week of planting, a UAV (3DR Solo) with Go-Pro 4 camera taking images at a rate of 2 images per second at 12 MP was taken at 27m altitude and continues to be taken. The images were stitched together using Mission Planner and Agisoft Software. (3) Two to three times a week, a UGV was driven through the plots and a ZED2 Stereo camera used to collect images at approximately 15 frames per second at a resolution of 720p. The camera is angled 35 degrees below horizontal and both the left and right lens cameras collect images. The built-in IMU sensors for the camera are also recorded. An Emlid RS+ GNSS receiver is mounted on the UGV and an Emlid RS2, dual-frequency receiver is mounted on the edge of field as a base station.GPS and all other data recorded into a ROS Bag file for each treatment. (4) A data repository was created with access to each of the data files. A total of 24 individual, 9.1m plots (6 treatments x 4 reps) data, were recorded into individual bag files. Secure access to files through sftp has beengiven to all team menbers (in Tifton and Athens). Related to the goal (3) above: (1) A prototypeplatform was developed(as a proof of concept) that can automatically collect and analyze images in real-time for leaf counting. To achieve this, a robotic platform was developed capable of navigating between plant rows, capturing the top-view images, and then detecting and counting the number of leaves in real-time. For detection and counting,a Tiny-YOLOv3 model (a light deep learning model) was adapted and trained to accurately count leaves in images acquired with our robotic platform. Using our trained model, a complete list of locations and dimensions of bounding boxes were generated to identify and count the number of leaves in images. Along with the YOLO model, we also provided a comparison with another state-of-the-art object detection method, namely, Faster R-CNN. (2) Our training and testing data sets were released. The images used in the dataset werecaptured over the course of one month from a group of 60 Arabidopsis plants and taken using a high-quality DSLR camera from a top down perspective. To obtain the labeled data, the images were taken and then each leaf was labeled with a bounding box representing its location. (3) Transfer learning-based models using Tiny-YOLOv3 were implemented to detect larger, mature leaves of the Arabidopsis plant, without retraining the entire model from scratch. The model was first trained on images with smaller leaves, organized by timestamp. This trained model was then used to detect, localize, and count larger leaves.

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: M. Buzzy, V. Thesma, M. Davoodi, and J. Mohammadpour Velni, Real-Time Plant Leaf Counting Using Deep Object Detection Networks, Sensors, 20(23): 6896, 2020. https://doi.org/10.3390/s20236896