FACTS: A Scalable Cyber Ecosystem for Acquisition, Curation, and Analysis of multispectral UAV image data

FACTS: A SCALABLE CYBER ECOSYSTEM FOR ACQUISITION, CURATION, AND ANALYSIS OF MULTISPECTRAL UAV IMAGE DATA

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1019752

Grant No.

2019-67021-29938

Cumulative Award Amt.

$499,845.00

Proposal No.

2018-09225

Multistate No.

(N/A)

Project Start Date

Sep 1, 2019

Project End Date

Aug 31, 2023

Grant Year

2019

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
IOWA STATE UNIVERSITY
2229 Lincoln Way
AMES,IA 50011

Performing Department
Agronomy

Non Technical Summary
A current cyber-agriculture data need is to improve and standardize data collection protocols and to develop curation processes and infrastructure to support the ability of machine learning (ML) to contribute to disease detection and mitigation in crops. This project will design and deploy a scalable, sustainable, data infrastructure platform that supports the data acquisition, curation, and hosting of data (primarily spectral images) collected from small unmanned aerial systems (sUAS) and will apply big data analytics modeling to this collection using deep learning (DL) to identify diseases in crops. Project objectives include (1): Develop a standardized approach for sUAS-based multispectral data collection, (2) Develop a scalable cyberinfrastructure system for data curation, (3) Develop and deploy cloud based, DL algorithms for disease detection. The project's goals and outcomes will develop a publicly available and curated digital ecosystem of labeled plant stress data, that is accessible through cloud-native computing and empowers users nationwide to accurately and rapidly identify and quantify plant diseases in multiple crops (through transfer learning) and further build the broader community of shared resources. The proposed data collection, curation and analysis framework, will enable a systems approach to disease identification and will empower various communities (research, farmer, industry) to effectively curate, utilize, and manage data for informed data-based decision making to further U.S. food and agriculture industries. The long-term goal is to create a framework that can be easily deployed for other major crops, which will lead to improved cost effectiveness and wider scope and applicability of the project outcomes across the U.S. The broader outcomes of this work will enable production of germplasmthat are more resistant to critical biotic and abiotic stresses allowing sustainable farming. Furthermore, the information and communication technology (ICT) tools that are developed will simultaneously enable precision farming, improve profitability, and increase sustainability.

Animal Health Component

40%

Research Effort Categories

Basic

30%

Applied

40%

Developmental

30%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
216	1820	1160	100%

Knowledge Area
216 - Integrated Pest Management Systems;

Subject Of Investigation
1820 - Soybean;

Field Of Science
1160 - Pathology;

Keywords

Goals / Objectives
This work has three supporting objectives:Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage.Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data.Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models with in situ analyses and will avoid expensive and less-secure data movements.

Project Methods
Sample collection: Various soybean fields will be scouted for the desired plant stresses. Image capture: Following identification of targeted fields, large-scale image capture will proceed using sUAS sensors in hyperspectral ranges collecting voluminous number of digital images of various diseases. The collected images will be automatically uploaded to CyVerse for storage and curation using the Calliope platform. Accessing trade-offs and identifying best practices: quantify the tradeoffs in terms of loss in reconstruction accuracy, compute time and storage requirements providing a look-up table for practitioners to identify and select the appropriate imaging standards given their operation protocols. Structure from Motion MultiView Stereo (SfM-MVS) techniques will generate 3D multispectral models. CyVerse has a suite of open-source and proprietary licensed software which are optimized for cloud-native and high-performance computing environments for 3D reconstructions.Big data cyberinfrastructure. Create a foliar-stress-specific big data cyberinfrastructure that will be the first user-friendly, scalable cloud environment that is extensible to a wide range of plant disease and stress data collection, curation, and model development. Data storage. Provide cloud based data storage via the CyVerse Data Store. Data exploration. Enable data exploration through data visualization by combining the information from multiple data sources in CyVerse Discovery Environment's Visualize and Interactive Computing Environment.Smart data label acquisition for learning: label-efficient deep learning to reduce the need for an extensive amount of labeled data. Cloud-based crowd sourcing: Formalize and investigate the accuracy, speed and quality metrics of crowd-sourced workers, and provide cloud-based links to this resource via CyVerse. Cloud based algorithms for robust deep phenotyping: develop robust learning techniques for deep neural network architectures, such as RCNN in order to solve various detection and localization problems related to plant phenotyping using sUAS data. Standardization of data analytics framework: We will create an open-source, CyVerse (i.e. cloud based) based suite of data preprocessing tools specifically required for plant phenotyping tasks.

Progress 09/01/19 to 08/31/23

Outputs
Target Audience:The target audience reached by our project includes undergraduate students,graduate students, researchers, post doctoral fellows and stakeholders Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We are training graduate students to fly drones, collect data, preprocess UAV data, learn how to visualize the data and analyze using ML tools, including two women graduate students, How have the results been disseminated to communities of interest?Yes, recently, a review paper was published in plant phenomics journal, 1. "W. Guo, M. E. Carroll, A. Singh, et al., "UAS-based plant phenotyping for research and breeding applications," Plant Phenomics, vol. 2021, Article ID 9840192, 21 pages, 2021. " 2. Singh et al. (2021). High throughput phenotyping in soybean. In "High-throughput Crop Phenotyping" Eds. J. Zhou, H. Nguyen. Springer-Nature. 3. Herr, Andrew W., Adak, Alper, Carroll, Matthew E., Elango, Dinakaran, Kar, Soumyashree, Li, Changying, Jones, Sarah E., Carter, Arron H., Murray, Seth C., Paterson, Andrew, Sankaran, Sindhuja, Singh, Arti, and Singh, Asheesh K. "Unoccupied aerial systems imagery for phenotyping in cotton, maize, soybean, and wheat breeding". Crop Science 63 (4), 2023 Country unknown/Code not available: Wiley Blackwell (John Wiley & Sons). https://doi.org/10.1002/csc2.21028. https://par.nsf.gov/biblio/10425341. What do you plan to do during the next reporting period to accomplish the goals?This was the last year of the project and we have collected a lot of stress data using UAV.

Impacts
What was accomplished under these goals? Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage. a).Aerial phenotyping was conducted using a Matrice 600 pro and the Zenmuse X5 camera with the Olympus 45mm focal length lens. Flights were flown with an 80% front overlap, and a 70% side overlap. Flights were conducted at 60 and 30 meters with the ground sampling distance for each flight being 0.50, 0.25 cm/pixels respectively. Flights for IDC ratings were conducted on the same dates as the manual ratings, between 10am and 2pm. Flights for the control field were taken on 37, 44, 49, 63 and 71 DAP. We recently published a review paper where, we have identified some best practices in data collection, curation and storage. b).A drought nursery was established on Fruitfield coarse sand in Muscatine, IA to screen soybean tolerance to drought stress conditions. Three replications of a diverse panel of 450 soybean plant introgression lines of MG 0 - IV were screened in 2020, 2021, and 2022 for a total of 4050 plots. UAV based data was collected at various timepoints with various sensors each year. The UAV based analysis pipeline consists of multiple steps. UAV data is stitched in Pix4D and orthomosaics processed through custom python code to segment plot boundaries and extract plot images (Matt Carrol, IDC, paper in progress). To identify the best sensors and extracted vegetation indices for drought tolerance screening and early detection, a machine learning pipeline is applied for feature extraction and classification. Selection of best ML methods for feature extraction and classification of drought stress is currently in progress. Resulting methods of analysis will be applied and tested on breeding program data from 2023. c). Two replications of 412 breeding lines and checks were evaluated in the field for tolerance to Iron deficient conditions. Visual ratings and UAV based multispectral and RGB data were collected at two time points in 2023. Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data. This year we worked on developing a pipeline for data storage on Cyverse. CyVerse maintains a multi-petabyte iRODS data. Objects in the data store appear as directories and files, in the same way as on a local file system. In order to start the storage, process our team contributed to new data to either private username spaces or a shared project space (/soynomics/rtfacts in the /iplant/home/projects). Users' data are by default set to private ownership in each respective user's namespace. Data can be shared within the CyVerse infrastructure with individuals, teams, the entire CyVerse user community, or shared publicly with the open internet ("anonymous" users) by adding individuals, groups, or public users to the permissions. Project data collected are uploaded to the data store via a combination of graphic user interfaces: browser based https:/, 3rd party apps, e.g. CyberDuck, Globus, or FileZilla, or over the command line using iRODS icommands (https://docs.irods.org/master/icommands/user/). The Soynomics/rtfacts public project directory (Table 1) is where UAV data are shared while they are under development. Since we are still in the process of analyzing the data, the access of data is private and will be made public once analysis is done and the manuscript is published. When the data are ready for final publication, they will be transferred to the /curated space, called the DataCommons (see Publication for details). Table 1: Locations of shared data from this FACT project on CyVerse data Store. Files can be accessed via the browser (https://), file mounting (WebDav), or with command line scripting (iRODS iCommands). Protocol Folder Path / Locations Shared Public CLI /iplant/home/shared/soynomics/rtfacts Yes No Browser https://data.cyverse.org/dav/iplant/home/projects/soynomics/rtfacts Yes No WebDav davs://data.cyverse.org/dav/iplant/home/projects/soynomics/rtfacts Yes No Browser https://data.cyverse.org/dav-anon/iplant/commons/community_released/soynomics//rtfacts Yes Yes Browser https://datacommons.cyverse.org/ Yes Yes Data are backed up nightly between UArizona and Texas Advanced Computing Center (TACC) data mirror. iRODS maintains metadata at the atomic level for all files in the Data Store and tracks their utilization and checksums which allow for data use metrics and data quality assurance. With iRODS data provenance and FAIR data principles (Wilkinson et al. 2016) are achieved to a greater degree than with commercially available cloud storage. These data are searchable in CyVerse via an ElasticSearch deployment which indexes all data in the data store. Our data hosted via WebDav can be accessed over https:// as base maps in browser applications (Leaflet), or in GIS desktop software as XYZ Tiles (OpenStreetMap Slippy Maps), Vector Tiles, MBTiles, or WMS/WMTS. In order, to do data preprocessing, Cyverse provide CyVerse data science workbench, the Discovery Environment (https://de.cyverse.org), supports user contributed Docker containers as Tools and Apps. These include remote desktop environments which can be used to run open source or licensed GIS software as well as UAV photogrammetry software. UAV data processing can take place in the Discovery Environment (https://de.cyverse.org) and CyVerse Atmosphere (https://atmo.cyverse.org) (Merchant et al. 2016), or on XSEDE (Towns et al. 2014) resources (Jetstream, TACC) (Hancock et al. 2018). CyVerse offers free resources of up to 48 CPU cores and 256 GB RAM with NVIDIA GPUs on its DE Workbench. CyVerse partner Jetstream (and soon Jetstream-2) offer additional OpenStack virtual machines (up to 40 cores, 120 GB RAM) through free start-up and research allocations via XSEDE. Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models within situ analyses and will avoid expensive and less-secure data movements Next Step will be to use ML tools on CyVerse. CyVerse has integrated Jupyter Lab (Python3), Agisoft Metashape, CoudCompare, QGIS, and OpenDroneMap as public tools for the project to use in the Discovery Environment. These apps are available to any user with a CyVerse account after validation of their profile with an .edu, .gov, or .org email address with ORCID. Analysis notebooks, code, or scripts can be brought by the researcher into running interactive containers in the DE workbench, where data can be quickly downloaded from the data store, and analyses conducted, or tested for full computational reproducibility. Data hosted on the CyVerse iRODS data store have the option to apply DublinCore (Weibel et al. 1998)and DataCite (Brase 2009) metadata templates. When public facing metadata are added, the metadata becomes readable by internet search engines using https://schema.org. CyVerse staff assign digital object identifiers (DOI) to curated data via DataCite, allowing users to cite them in their research publications. After a dataset is published with DOI it is made immutable with no future changes to the data. Data hosted in the project's public repositories can stay in the "Community Released" space indefinitely, allowing them to be modified.

Publications

Type: Journal Articles Status: Published Year Published: 2021 Citation: 1. Guo, W.; Carroll, M.E.; Singh, A.; Swetnam, T.; Merchant, N.; Sarkar, S.; Singh, A.K.; Ganapathysubramanian, B. UAS Based Plant Phenotyping for Research and Breeding Applications. Plant Phenomics 2021, 2021, 9840192. [CrossRef] 2. Singh et al. (2021). High throughput phenotyping in soybean. In High-throughput Crop Phenotyping Eds. J. Zhou, H. Nguyen. Springer-Nature. 3. Herr, Andrew W., Adak, Alper, Carroll, Matthew E., Elango, Dinakaran, Kar, Soumyashree, Li, Changying, Jones, Sarah E., Carter, Arron H., Murray, Seth C., Paterson, Andrew, Sankaran, Sindhuja, Singh, Arti, and Singh, Asheesh K. "Unoccupied aerial systems imagery for phenotyping in cotton, maize, soybean, and wheat breeding". Crop Science 63 (4), 2023 Country unknown/Code not available: Wiley Blackwell (John Wiley & Sons). https://doi.org/10.1002/csc2.21028. https://par.nsf.gov/biblio/10425341.

Progress 09/01/21 to 08/31/22

Outputs
Target Audience:Researchers, stakeholders including farmers and graduate and undergraduate students. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Training of graduate students to fly UAVs and collect stress trait data. Demonstrating skill sets to stakeholders in NAPB meeting field tours. How have the results been disseminated to communities of interest?Yes, through publishing posters and research papers and demonstrating the UAV capturingtraits in field especially stress related traits. What do you plan to do during the next reporting period to accomplish the goals?Continued progress for the three objectives listed in this project.

Impacts
What was accomplished under these goals? Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage. Time series GWAS for Iron Deficiency Chlorosis tolerance in soybean using aerial imagery was carried out. Aerial phenotyping was conducted using a Matrice 600 pro and the Zenmuse X5 camera with the Olympus 45mm focal length lens (DJI Technology Co., Shenzhen, China). Flights were flown with an 80% front overlap, and a 70% side overlap. Flights were conducted at 60 and 30 meters with the ground sampling distance for each flight being 0.50 and 0.25 cm/pixels, respectively. Flights for IDC ratings were conducted on the same dates as the manual ratings, between 10am and 2pm. Flights for the non-IDC field were taken on 37, 44, 49, 63 and 71 DAP. The non-IDC field imaging data were used for canopy growth comparisons. Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data. This year we worked on developing pipeline for data storage on Cyverse. CyVerse maintains a multi-petabyte iRODS data. Objects in the data store appear as directories and files, in the same way as on a local file system. In order to start the storage, process we (Soynomics team) contributed to new data to either private username spaces or a shared project space (/soynomics/rtfacts in the /iplant/home/projects). Users' data are by default set to private ownership in each respective user's namespace. Data can be shared within the CyVerse infrastructure with individuals, teams, the entire CyVerse user community, or shared publicly with the open internet ("anonymous" users) by adding individuals, groups, or public users to the permissions. Project data collected are uploaded to the data store via a combination of graphic user interfaces: browser based https:/, 3rd party apps, e.g. CyberDuck, Globus, or FileZilla, or over the command line using iRODS icommands (https://docs.irods.org/master/icommands/user/). The Soynomics/rtfacts public project directory (Table 1) is where UAV data are shared while they are under development. Since we are still in process of analyzing the data, the access of data is private and will be made public once analysis is done and manuscript is published. When the data are ready for final publication, they will be transferred to the /curated space, called the DataCommons (see Publication for details). Table 1:Locations of shared data from Soynomics project on CyVerse data Store. Files can be accessed via the browser (https://), file mounting (WebDav), or with command line scripting (iRODS iCommands). Protocol Folder Path/Locations CLI /iplant/home/shared/soynomics/rtfacts, shared, not public Browserhttps://data.cyverse.org/dav/iplant/home/projects/soynomics/rtfacts, shared, not public WebDav davs://data.cyverse.org/dav/iplant/home/projects/soynomics/rtfacts, shared, not public Browserhttps://data.cyverse.org/dav-anon/iplant/commons/community_released/soynomics//rtfacts, shared, public Browserhttps://datacommons.cyverse.org/, shared, public Data are backed up nightly between University of Arizona and Texas Advanced Computing Center (TACC) data mirror. iRODS maintains metadata at the atomic level for all files in the Data Store and tracks their utilization and checksums which allow for data use metrics and data quality assurance. With iRODS data provenance and FAIR data principles(Wilkinson et al. 2016)are achieved to a greater degree than with commercially available cloud storage. These data are searchable in CyVerse via an ElasticSearch deployment which indexes all data in the data store. Soynomics data hosted via WebDav can be accessed over https:// as base maps in browser applications (Leaflet), or in GIS desktop software as XYZ Tiles (OpenStreetMap Slippy Maps), Vector Tiles, MBTiles, or WMS/WMTS. In order to do data preprocessing, Cyverse provide CyVerse data science workbench, the Discovery Environment (https://de.cyverse.org), supports user contributed Docker containers as Tools and Apps. These include remote desktop environments which can be used to run open source or licensed GIS software as well as UAV photogrammetry software. UAV data processing can take place in the Discovery Environment (https://de.cyverse.org) and CyVerse Atmosphere (https://atmo.cyverse.org)(Merchant et al. 2016), or on XSEDE(Towns et al. 2014)resources (Jetstream, TACC)(Hancock et al. 2018). CyVerse offers free resources of up to 48 CPU cores and 256 GB RAM with NVIDIA GPUs on its DE Workbench. CyVerse partner Jetstream (and soon Jetstream-2) offer additional OpenStack virtual machines (up to 40 cores, 120 GB RAM) through free start-up and research allocations via XSEDE. Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models within situ analyses and will avoid expensive and less-secure data movements. To classify iron deficiency chlorosis (IDC) in soybean, the Tensorflow Python library was used to develop a Keras Sequential prediction model. The model itself had three convolutional layers and one fully connected layer that were activated by a ReLu activation function as well as a dropout layer that randomly disabled 20% of the neurons in the network to reduce overfitting. To train the model, the training images were first converted into three-dimensional NumPy arrays of size 365 x 365 x 3, which represents the image width in pixels, image height in pixels, and RGB values of each pixel, respectively. The three-dimensional NumPy arrays were then appended into a four-dimensional NumPy array that contained all the training images. The model was then fit to the training data using a batch size of 32 images over the course of 30 epochs. To increase the practicality of implementing deep-learning methods in high-throughput stress phenotyping, the Cyverse Discovery Environment was used to run analyses on the cloud instead of local devices. To do so, a tool had to be created in the Discovery Environment that pulled from the official Tensorflow Docker Image. The tool was then used to create a Visual Interactive Computing Environment app that runs a Tensorflow Docker container, ensuring that the dependencies and latest version of Tensorflow are installed in the computing environment. This process sets up an isolated virtual environment on the Cyverse machine where the Python script can be executed, allowing analyses to be performed remotely without taxing local resources. Analysis notebooks, code, or scripts can be brought by the researcher into running interactive containers in the DE workbench, where data can be quickly downloaded from the data store, and analyses conducted, or tested for full computational reproducibility. Data hosted on the CyVerse iRODS data store have the option to apply DublinCore(Weibel et al. 1998)and DataCite(Brase 2009)metadata templates. When public facing metadata are added, the metadata becomes readable by internet search engines using https://schema.org. CyVerse staff assign digital object identifiers (DOI) to curated data via DataCite, allowing users to cite them in their research publications. After a dataset is published with DOI it is made immutable with no future changes to the data. Data hosted in the Soynomics public repositories can stay in the "Community Released" space indefinitely, allowing them to be modified.

Publications

Type: Journal Articles Status: Published Year Published: 2022 Citation: ] Chiteri KO, TZ Jubery, S Dutta, B Ganapathysubramanian, S Cannon, A Singh*. 2022. Dissecting the root phenotypic and genotypic variability of the Iowa mung bean diversity panel. Frontiers in Plant Science, p.3128. doi: 10.3389/fpls.2021.808001.
Type: Journal Articles Status: Submitted Year Published: 2022 Citation: Time series GWAS for Iron Deficiency Chlorosis tolerance in soybean using aerial imagery Matt Carroll, Sahishnu Hanamolu, Ashlyn Rairdin, Antonella Ferela, Soumik Sarkar, Baskar Ganapathysubramanian, Arti Singh*, Asheesh K. Singh*
Type: Journal Articles Status: Submitted Year Published: 2022 Citation: Deep learning-based phenotyping for genome wide association studies of sudden death syndrome in soybean Ashlyn Rairdin, Fateme Fotouhi, Jiaoping Zhang, Daren S. Mueller, Baskar Ganapathysubramanian, Asheesh K. Singh, Somak Dutta, Soumik Sarkar, Arti Singh,

Progress 09/01/20 to 08/31/21

Outputs
Target Audience:Target audience are research scientists, Post doctoral fellows, undergraduate and graduate students working in the field of phenomics and high throughput phneotyping using UAVs. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have trained graduate students to fly drones, collect data, preprocess UAV data, learn how to visualize the data and analyze using ML tools, including two women graduate students.Also, our women graduates are participating in college fest to showcase the capability of UAV/drones and how they are utilizing UAV's in their research projects. Recently students participated in soyfest at the college. In addition to this, these students are participating in field visits attended by other researchers and graduate students to showcase how UAV technology is being used to answers their research questions and how it can be a powerful tool for farmers in future. How have the results been disseminated to communities of interest?Yes, recently a review paper was published in plant phenomics journal, "W. Guo, M. E. Carroll, A. Singh et al., "UAS-based plant phenotyping for research and breeding applications, "Plant Phenomics, vol. 2021, Article ID 9840192, 21 pages, 2021." What do you plan to do during the next reporting period to accomplish the goals?Next year we will work on the third objective of our project and simultaneously will be collecting some more stress data using UAV.

Impacts
What was accomplished under these goals? ImpactthisProject Year: The main goal of the project is to develop scalable cyberinfrastructure for UAV data acquisition, curation and analysis. This year we were able to work on creating a framework using Cyverse cyberinfrastructure to do that in a seamless manner. The work intiated this year to achieve end to end analysis framework will help other researchers in academia and industry to follow the standard practice when dealing with such complex data and will provide better insight into the data so that automated stress detection and mitigation can be provided in a timely manner. This will also allow multiple institutions to work on the data in a more collaborative manner. Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage. We have collected UAV data on IDC stress. Aerial phenotyping was conducted using a Matrice 600 pro and the Zenmuse X5 camera with the Olympus 45mm focal length lens. Flights were flown with an 80% front overlap, and a 70% side overlap. Flights were conducted at 60 and 30 meters with the ground sampling distance for each flight being 0.50, 0.25 cm/pixels respectively. Flights for IDC ratings were conducted on the same dates as the manual ratings, between 10am and 2pm. Flights for the control field were taken on 37, 44, 49, 63 and 71 DAP. We recently published a review paper where, we have identified some best practices in data collection, curation, and storage. Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data. The second goal is to provide a remotely and publicly available database management framework (DMF) that allows for the uploading, storing, and retrieval of sUAS-based multi-spectral data as well as derivative data. The DMF will be available on CyVerse. This year we have created a dedicated folder for the FACT project on CyVerse (https://cyverse.org/) to store data. The folder is publicly accessible at /iplant/home/projects/FACT. To access the data a user needs to create an account on CyVerse. These data are easily searchable in CyVerse via an ElasticSearch deployment which indexes all data in the folder. The description of this DMF can be found in the following paragraph. So far, we have uploaded a sample sUAS captured RGB image data to the FACT project folder. Currently, we are compiling a standard list of derivative data or metadata that will be uploaded along with each image data. The metadata includes ground controls points, camera/sensor parameters, weather parameters, design of experiment, cultivar information, etc. We developed a framework using CyVerse data science workbench for image pre-processing, which includes color correction, noise reduction, orthomosaic creation, and plot extraction. The description of this workbench can be found in the following paragraph. We are currently preprocessing the RGB images to generate orthomosaics and extract plots. We also set up a jupyter notebook framework to develop models to classify images based on disease ratings. Currently, we are preparing data to train a model. The trained model will be made public. CyVerse maintains a multi-petabyte iRODS data. Objects in the data store appear as directories and files, in the same way as on a local file system. In order to start the storage, process our team contributed to new data to either private username spaces or a shared project space (/soynomics/rtfacts in the /iplant/home/projects). Users' data are by default set to private ownership in each respective user's namespace. Data can be shared within the CyVerse infrastructure with individuals, teams, the entire CyVerse user community, or shared publicly with the open internet ("anonymous" users) by adding individuals, groups, or public users to the permissions. Project data collected are uploaded to the data store via a combination of graphic user interfaces: browser-based https:/, 3rd party apps, e.g. CyberDuck, Globus, or FileZilla, or over the command line using iRODS icommands (https://docs.irods.org/master/icommands/user/). The Soynomics/rtfacts public project directory (Table 1) is where UAV data are shared while they are under development. Since we are still in process of analyzing the data, the access of data is private and will be made public once analysis is done and manuscript is published. When the data are ready for final publication, they will be transferred to the /curated space, called the DataCommons. iRODS maintains metadata at the atomic level for all files in the Data Store and tracks their utilization and checksums which allow for data use metrics and data quality assurance. With iRODS data provenance and FAIR data principles (Wilkinson et al. 2016) are achieved to a greater degree than with commercially available cloud storage. These data are searchable in CyVerse via an ElasticSearch deployment which indexes all data in the data store. Our data hosted via WebDav can be accessed over https:// as base maps in browser applications (Leaflet), or in GIS desktop software as XYZ Tiles (OpenStreetMap Slippy Maps), Vector Tiles, MBTiles, or WMS/WMTS. In order, to do data preprocessing, Cyverse provide CyVerse data science workbench, the Discovery Environment (https://de.cyverse.org), supports user contributed Docker containers as Tools and Apps. Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models within situ analyses and will avoid expensive and less-secure data movements. Next Step will be to use ML tools on CyVerse. CyVerse has integrated Jupyter Lab (Python3), Agisoft Metashape, CoudCompare, QGIS, and OpenDroneMap as public tools for the project to use in the Discovery Environment. These apps are available to any user with a CyVerse account after validation of their profile with an .edu, .gov, or .org email address with ORCID. Analysis notebooks, code, or scripts can be brought by the researcher into running interactive containers in the DE workbench, where data can be quickly downloaded from the data store, and analyses conducted, or tested for full computational reproducibility. Data hosted on the CyVerse iRODS data store have the option to apply DublinCore (Weibel et al. 1998) and DataCite (Brase 2009) metadata templates. When public facing metadata are added, the metadata becomes readable by internet search engines using https://schema.org. CyVerse staff assign digital object identifiers (DOI) to curated data via DataCite, allowing users to cite them in their research publications. After a dataset is published with DOI it is made immutable with no future changes to the data. Data hosted in the project's public repositories can stay in the "Community Released" space indefinitely, allowing them to be modified.

Publications

Type: Journal Articles Status: Published Year Published: 2021 Citation: Guo, W.; Carroll, M.E.; Singh, A.; Swetnam, T.; Merchant, N.; Sarkar, S.; Singh, A.K.; Ganapathysubramanian, B. UAS Based Plant Phenotyping for Research and Breeding Applications. Plant Phenomics 2021, 9840192. [CrossRef

Progress 09/01/19 to 08/31/20

Outputs
Target Audience:The target audience reached by our project includes undergraduate students, researchers and post doctoral fellows. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Two PhD students were trained to take licence to fly UAV to do plant stress phenotyping of various traits in high throughput and more precise and accurate way. The trained students are also showcasing their data collection campaign to undergraduate students who are coming for field visits. How have the results been disseminated to communities of interest?Due to COVID-19 situation field days and farm progress shows were not conducuted to connect with farmers. What do you plan to do during the next reporting period to accomplish the goals?Next year we will continue our objective 1 and 2 and will keep doing data collection and its seamless integration with data curation using Cyverse platform.

Impacts
What was accomplished under these goals? In first year of the project we focused on our first objective which is to develop standardized approaches for sUAS-based multispectral (UBM) data collection through the development of a scalable and sustainable approach. The UAV was flown using hyperspectral and RGB camera at three different heights for collection of plant abiotic stress (for example iron deficiency chlorosis, drought tolerance) and biotic stresses for example sudden death syndrome and Frogeye leaf spot in soybean was done in experimental field around ames, Iowa. Plant phenotyping protocol for measuring plant stress trait using hyperspectral camera deployed on UAV was developed. This includes identifying various tradeoffs in imaging (image overlap, image size, image resolution) and storage (compression) to identify and suggest viable best practices for data collection.

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: Arti Singh, Sarah Jones, Baskar Ganapathysubramanian, Soumik Sarkar, Daren Mueller, Kulbir Sandhu, Koushik Nagasubramanian. Challenges and Opportunities in Machine-Augmented Plant Stress Phenotyping. Trends in Plant Science, 2020.