Source: IOWA STATE UNIVERSITY submitted to
FACTS: A SCALABLE CYBER ECOSYSTEM FOR ACQUISITION, CURATION, AND ANALYSIS OF MULTISPECTRAL UAV IMAGE DATA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
NEW
Funding Source
Reporting Frequency
Annual
Accession No.
1019752
Grant No.
2019-67021-29938
Project No.
IOW05587
Proposal No.
2018-09225
Multistate No.
(N/A)
Program Code
A1541
Project Start Date
Sep 1, 2019
Project End Date
Aug 31, 2022
Grant Year
2019
Project Director
Singh, A.
Recipient Organization
IOWA STATE UNIVERSITY
2229 Lincoln Way
AMES,IA 50011
Performing Department
Agronomy
Non Technical Summary
A current cyber-agriculture data need is to improve and standardize data collection protocols and to develop curation processes and infrastructure to support the ability of machine learning (ML) to contribute to disease detection and mitigation in crops. This project will design and deploy a scalable, sustainable, data infrastructure platform that supports the data acquisition, curation, and hosting of data (primarily spectral images) collected from small unmanned aerial systems (sUAS) and will apply big data analytics modeling to this collection using deep learning (DL) to identify diseases in crops. Project objectives include (1): Develop a standardized approach for sUAS-based multispectral data collection, (2) Develop a scalable cyberinfrastructure system for data curation, (3) Develop and deploy cloud based, DL algorithms for disease detection. The project's goals and outcomes will develop a publicly available and curated digital ecosystem of labeled plant stress data, that is accessible through cloud-native computing and empowers users nationwide to accurately and rapidly identify and quantify plant diseases in multiple crops (through transfer learning) and further build the broader community of shared resources. The proposed data collection, curation and analysis framework, will enable a systems approach to disease identification and will empower various communities (research, farmer, industry) to effectively curate, utilize, and manage data for informed data-based decision making to further U.S. food and agriculture industries. The long-term goal is to create a framework that can be easily deployed for other major crops, which will lead to improved cost effectiveness and wider scope and applicability of the project outcomes across the U.S. The broader outcomes of this work will enable production of germplasmthat are more resistant to critical biotic and abiotic stresses allowing sustainable farming. Furthermore, the information and communication technology (ICT) tools that are developed will simultaneously enable precision farming, improve profitability, and increase sustainability.
Animal Health Component
0%
Research Effort Categories
Basic
30%
Applied
40%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
21618201160100%
Knowledge Area
216 - Integrated Pest Management Systems;

Subject Of Investigation
1820 - Soybean;

Field Of Science
1160 - Pathology;
Goals / Objectives
This work has three supporting objectives:Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage.Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data.Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models with in situ analyses and will avoid expensive and less-secure data movements.
Project Methods
Sample collection: Various soybean fields will be scouted for the desired plant stresses. Image capture: Following identification of targeted fields, large-scale image capture will proceed using sUAS sensors in hyperspectral ranges collecting voluminous number of digital images of various diseases. The collected images will be automatically uploaded to CyVerse for storage and curation using the Calliope platform. Accessing trade-offs and identifying best practices: quantify the tradeoffs in terms of loss in reconstruction accuracy, compute time and storage requirements providing a look-up table for practitioners to identify and select the appropriate imaging standards given their operation protocols. Structure from Motion MultiView Stereo (SfM-MVS) techniques will generate 3D multispectral models. CyVerse has a suite of open-source and proprietary licensed software which are optimized for cloud-native and high-performance computing environments for 3D reconstructions.Big data cyberinfrastructure. Create a foliar-stress-specific big data cyberinfrastructure that will be the first user-friendly, scalable cloud environment that is extensible to a wide range of plant disease and stress data collection, curation, and model development. Data storage. Provide cloud based data storage via the CyVerse Data Store. Data exploration. Enable data exploration through data visualization by combining the information from multiple data sources in CyVerse Discovery Environment's Visualize and Interactive Computing Environment.Smart data label acquisition for learning: label-efficient deep learning to reduce the need for an extensive amount of labeled data. Cloud-based crowd sourcing: Formalize and investigate the accuracy, speed and quality metrics of crowd-sourced workers, and provide cloud-based links to this resource via CyVerse. Cloud based algorithms for robust deep phenotyping: develop robust learning techniques for deep neural network architectures, such as RCNN in order to solve various detection and localization problems related to plant phenotyping using sUAS data. Standardization of data analytics framework: We will create an open-source, CyVerse (i.e. cloud based) based suite of data preprocessing tools specifically required for plant phenotyping tasks.

Progress 09/01/20 to 08/31/21

Outputs
Target Audience:Target audience are research scientists, Post doctoral fellows, undergraduate and graduate students working in the field of phenomics and high throughput phneotyping using UAVs. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have trained graduate students to fly drones, collect data, preprocess UAV data, learn how to visualize the data and analyze using ML tools, including two women graduate students.Also, our women graduates are participating in college fest to showcase the capability of UAV/drones and how they are utilizing UAV's in their research projects. Recently students participated in soyfest at the college. In addition to this, these students are participating in field visits attended by other researchers and graduate students to showcase how UAV technology is being used to answers their research questions and how it can be a powerful tool for farmers in future. How have the results been disseminated to communities of interest?Yes, recently a review paper was published in plant phenomics journal, "W. Guo, M. E. Carroll, A. Singh et al., "UAS-based plant phenotyping for research and breeding applications, "Plant Phenomics, vol. 2021, Article ID 9840192, 21 pages, 2021." What do you plan to do during the next reporting period to accomplish the goals?Next year we will work on the third objective of our project and simultaneously will be collecting some more stress data using UAV.

Impacts
What was accomplished under these goals? ImpactthisProject Year: The main goal of the project is to develop scalable cyberinfrastructure for UAV data acquisition, curation and analysis. This year we were able to work on creating a framework using Cyverse cyberinfrastructure to do that in a seamless manner. The work intiated this year to achieve end to end analysis framework will help other researchers in academia and industry to follow the standard practice when dealing with such complex data and will provide better insight into the data so that automated stress detection and mitigation can be provided in a timely manner. This will also allow multiple institutions to work on the data in a more collaborative manner. Objective 1: Develop a standardized approach for sUAS-based multispectral data collection. This includes identifying tradeoffs in data collection (oblique versus nadir, image overlap, ground sample distance, resolution) to identify best practices for data collection, curation, and storage. We have collected UAV data on IDC stress. Aerial phenotyping was conducted using a Matrice 600 pro and the Zenmuse X5 camera with the Olympus 45mm focal length lens. Flights were flown with an 80% front overlap, and a 70% side overlap. Flights were conducted at 60 and 30 meters with the ground sampling distance for each flight being 0.50, 0.25 cm/pixels respectively. Flights for IDC ratings were conducted on the same dates as the manual ratings, between 10am and 2pm. Flights for the control field were taken on 37, 44, 49, 63 and 71 DAP. We recently published a review paper where, we have identified some best practices in data collection, curation, and storage. Objective 2: Develop a scalable cyberinfrastructure system for data curation by formalizing a cloud based remotely accessible database management framework (DMF), with imagery and derivative data (ground control points, orthomosaics, point clouds). The DMF will be hosted on CyVerse to allow for upload, storage, retrieve, access, and dissemination of the time-varying sUAS data. The second goal is to provide a remotely and publicly available database management framework (DMF) that allows for the uploading, storing, and retrieval of sUAS-based multi-spectral data as well as derivative data. The DMF will be available on CyVerse. This year we have created a dedicated folder for the FACT project on CyVerse (https://cyverse.org/) to store data. The folder is publicly accessible at /iplant/home/projects/FACT. To access the data a user needs to create an account on CyVerse. These data are easily searchable in CyVerse via an ElasticSearch deployment which indexes all data in the folder. The description of this DMF can be found in the following paragraph. So far, we have uploaded a sample sUAS captured RGB image data to the FACT project folder. Currently, we are compiling a standard list of derivative data or metadata that will be uploaded along with each image data. The metadata includes ground controls points, camera/sensor parameters, weather parameters, design of experiment, cultivar information, etc. We developed a framework using CyVerse data science workbench for image pre-processing, which includes color correction, noise reduction, orthomosaic creation, and plot extraction. The description of this workbench can be found in the following paragraph. We are currently preprocessing the RGB images to generate orthomosaics and extract plots. We also set up a jupyter notebook framework to develop models to classify images based on disease ratings. Currently, we are preparing data to train a model. The trained model will be made public. CyVerse maintains a multi-petabyte iRODS data. Objects in the data store appear as directories and files, in the same way as on a local file system. In order to start the storage, process our team contributed to new data to either private username spaces or a shared project space (/soynomics/rtfacts in the /iplant/home/projects). Users' data are by default set to private ownership in each respective user's namespace. Data can be shared within the CyVerse infrastructure with individuals, teams, the entire CyVerse user community, or shared publicly with the open internet ("anonymous" users) by adding individuals, groups, or public users to the permissions. Project data collected are uploaded to the data store via a combination of graphic user interfaces: browser-based https:/, 3rd party apps, e.g. CyberDuck, Globus, or FileZilla, or over the command line using iRODS icommands (https://docs.irods.org/master/icommands/user/). The Soynomics/rtfacts public project directory (Table 1) is where UAV data are shared while they are under development. Since we are still in process of analyzing the data, the access of data is private and will be made public once analysis is done and manuscript is published. When the data are ready for final publication, they will be transferred to the /curated space, called the DataCommons. iRODS maintains metadata at the atomic level for all files in the Data Store and tracks their utilization and checksums which allow for data use metrics and data quality assurance. With iRODS data provenance and FAIR data principles (Wilkinson et al. 2016) are achieved to a greater degree than with commercially available cloud storage. These data are searchable in CyVerse via an ElasticSearch deployment which indexes all data in the data store. Our data hosted via WebDav can be accessed over https:// as base maps in browser applications (Leaflet), or in GIS desktop software as XYZ Tiles (OpenStreetMap Slippy Maps), Vector Tiles, MBTiles, or WMS/WMTS. In order, to do data preprocessing, Cyverse provide CyVerse data science workbench, the Discovery Environment (https://de.cyverse.org), supports user contributed Docker containers as Tools and Apps. Objective 3: Develop and deploy cloud based, DL algorithms for disease detection. These algorithms will be deployed in a distributed manner on CyVerse for ease-of-use by the broader community. A key innovation of distributed DL protocols will employ advanced learning models within situ analyses and will avoid expensive and less-secure data movements. Next Step will be to use ML tools on CyVerse. CyVerse has integrated Jupyter Lab (Python3), Agisoft Metashape, CoudCompare, QGIS, and OpenDroneMap as public tools for the project to use in the Discovery Environment. These apps are available to any user with a CyVerse account after validation of their profile with an .edu, .gov, or .org email address with ORCID. Analysis notebooks, code, or scripts can be brought by the researcher into running interactive containers in the DE workbench, where data can be quickly downloaded from the data store, and analyses conducted, or tested for full computational reproducibility. Data hosted on the CyVerse iRODS data store have the option to apply DublinCore (Weibel et al. 1998) and DataCite (Brase 2009) metadata templates. When public facing metadata are added, the metadata becomes readable by internet search engines using https://schema.org. CyVerse staff assign digital object identifiers (DOI) to curated data via DataCite, allowing users to cite them in their research publications. After a dataset is published with DOI it is made immutable with no future changes to the data. Data hosted in the project's public repositories can stay in the "Community Released" space indefinitely, allowing them to be modified.

Publications

  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Guo, W.; Carroll, M.E.; Singh, A.; Swetnam, T.; Merchant, N.; Sarkar, S.; Singh, A.K.; Ganapathysubramanian, B. UAS Based Plant Phenotyping for Research and Breeding Applications. Plant Phenomics 2021, 9840192. [CrossRef


Progress 09/01/19 to 08/31/20

Outputs
Target Audience:The target audience reached by our project includes undergraduate students, researchers and post doctoral fellows. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Two PhD students were trained to take licence to fly UAV to do plant stress phenotyping of various traits in high throughput and more precise and accurate way. The trained students are also showcasing their data collection campaign to undergraduate students who are coming for field visits. How have the results been disseminated to communities of interest?Due to COVID-19 situation field days and farm progress shows were not conducuted to connect with farmers. What do you plan to do during the next reporting period to accomplish the goals?Next year we will continue our objective 1 and 2 and will keep doing data collection and its seamless integration with data curation using Cyverse platform.

Impacts
What was accomplished under these goals? In first year of the project we focused on our first objective which is to develop standardized approaches for sUAS-based multispectral (UBM) data collection through the development of a scalable and sustainable approach. The UAV was flown using hyperspectral and RGB camera at three different heights for collection of plant abiotic stress (for example iron deficiency chlorosis, drought tolerance) and biotic stresses for example sudden death syndrome and Frogeye leaf spot in soybean was done in experimental field around ames, Iowa. Plant phenotyping protocol for measuring plant stress trait using hyperspectral camera deployed on UAV was developed. This includes identifying various tradeoffs in imaging (image overlap, image size, image resolution) and storage (compression) to identify and suggest viable best practices for data collection.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Arti Singh, Sarah Jones, Baskar Ganapathysubramanian, Soumik Sarkar, Daren Mueller, Kulbir Sandhu, Koushik Nagasubramanian. Challenges and Opportunities in Machine-Augmented Plant Stress Phenotyping. Trends in Plant Science, 2020.