FACT: The PlantCV Platform: Open-source Tools for Plant Phenomics Across Scales

FACT: THE PLANTCV PLATFORM: OPEN-SOURCE TOOLS FOR PLANT PHENOMICS ACROSS SCALES

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1019711

Grant No.

2019-67021-29926

Cumulative Award Amt.

$498,649.00

Proposal No.

2018-09179

Multistate No.

(N/A)

Project Start Date

Aug 1, 2019

Project End Date

Jul 31, 2023

Grant Year

2019

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
DONALD DANFORTH PLANT SCIENCE CENTER
975 NORTH WARSON ROAD
ST. LOUIS,MO 63132

Performing Department
(N/A)

Non Technical Summary
To feed, clothe, and power the world in the face of a growing and urbanizing world population, we need technologies that accelerate plant breeding and crop development pipelines. Imaging and remote sensing technologies, coupled with data analytics, aims to increase the throughput of measuring plant physical and physiological features (phenotyping) by enabling the ability to non-destructively assay more genetic lineages at higher spatial and temporal resolution. While many tools and algorithms exist to extract information from image data, use of image analysis tools for plant phenotyping is still a relatively new field, and many existing tools are either created for targeted purposes or are poorly maintained after release. We built the open-source PlantCV software package to address these challenges and our goal is to provide a common interface for plant phenotyping algorithms with the aim to build a modular platform that we and others could build on.In this project we will build on the existing PlantCV platform to develop new tools and capabilities for plant phenotyping. In particular, we will develop new analysis capabilities that utilize machine learning for automated plant feature detection, and broaden support for new types of cameras and sensors. A new toolkit that streamlines the collection of human-curated data used to train machine learning algorithms will enhance the utility of the new analysis tools. A new data and computing management system will improve the ability of users to deploy plant phenotyping tools on diverse systems at infrastructure-scale. Opportunities for training and education of stakeholders will be provided through hands-on workshops and online, interactive documentation. Our overall goal, which aligns with the Food and Agriculture Cyberinformatics and Tools program area priorities, is to build a scalable analysis and data integration platform that enables stakeholders in the plant phenotyping community to effectively utilize data to accelerate discoveries in plant science and agricultural research.

Animal Health Component

Research Effort Categories

Basic

40%

Applied

Developmental

60%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
206	7210	2080	80%
903	7299	2080	20%

Knowledge Area
206 - Basic Plant Biology; 903 - Communication, Education, and Information Delivery;

Subject Of Investigation
7299 - Research equipment and methods, general/other; 7210 - Remote sensing equipment and technology;

Field Of Science
2080 - Mathematics and computer sciences;

Keywords

scalable infrastructure

Goals / Objectives
The overall goal of the project is to build on the PlantCV (https://plantcv.danforthcenter.org/) platform to build a scalable plant phenotyping analysis and data integration platform that enables stakeholders in the plant science community to effectively utilize data to accelerate discoveries in plant science and agricultural research. The specific objectives of the project are to 1) Develop analysis methods that utilize machine learning and high-dimensional datasets; 2) Develop a highly scalable, platform-agnostic analysis workflow and data management system for plant image analysis; 3) Engage phenomics software users and developers from the public and private sector through hands-on workshops and by developing interactive documentation.

Project Methods
Evaluation of PlantCV: PlantCV and PlantCV training materials will be evaluated during quarterly virtual user meetings and ad hoc via GitHub Issues. New analysis methods and algorithms will be developed using our standard development workflow. New or updated functionality to PlantCV are first developed locally. Local changes are submitted for review on GitHub (a pull request). Pull requests trigger an automated build and testing environment that evaluates whether the new and existing code base function correctly given known inputs and outputs. Coverage analysis measures the proportion of the codebase covered by tests to ensure that there are no missing areas of code in the testing suite. Mandatory code review by someone other than the author of the pull request is done prior to merging the updated code with the master copy of PlantCV. These tools ensure that updates to the software are robust and sustainable. New methods will be developed using sample and training datasets. Performance of new algorithms will be benchmarked for general performance and will be evaluated using human-curated datasets or manual measurements.Evaluation stakeholder workshops: Hands-on workshops will be evaluated by pre- and post-workshop surveys. Workshops are structured so that stakeholders do image segmentation and extraction of trait data. Therefore, workshop success can be measured by successful analysis of image data during the workshop. All workshop materials will be made available online and downloads of training materials will be tracked.

Progress 08/01/19 to 07/31/23

Outputs
Target Audience:Our research program developed an open-source, modular software toolbox (PlantCV: https://plantcv.org) to enable high-throughput plant phenotyping (measuring or monitoring plant physical, physiological, and morphological properties) work in research, breeding, and crop management activities. Our efforts focused on the application of phenotyping technologies for research purposes, particularly in the context of controlled-environment (e.g. greenhouse and growth chambers) systems. But this work was a critical starting place to build robust, foundational tools and frameworks that will enable expansion to other application areas in the future. Additionally, the tools we developed enable researchers to accelerate gene discovery and pre-breeding work by providing a robust framework for using multi-modal image data to non-destructively and quantitatively measure plant phenotypes at a high spatial and temporal resolution for large populations of plants. To measure the reach and impact of PlantCV to our target audiences, we monitored several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community, and reached more than 1,450 followers by the end of the project. PlantCV software is available as source code through the GitHub platform where it is downloaded ~11 times per day on average. PlantCV is also available as an easy to install package on the Python Package Index and through Conda Forge where it is downloaded ~80 times per day on average. Over the full project period the PlantCV website (featuring our Twitter feed, pre-recorded seminars, webinars, and workshops about PlantCV, and publication lists) and documentation pages had tens of thousands of visitors from a worldwide audience and all 50 US states. Cumulatively, 70 publications/preprints that utilized PlantCV were published, in addition to 4 papers by PlantCV developers. Current and potential users of PlantCV software are a key target audience, and a main objective of this project aims to provide hands-on training opportunities. We have four virtual workshops through the PhenomeForce group (https://phenome-force.github.io/PhenomeForce/) and NAPPN 2021 annual meeting that have been viewed over 7,500 times. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?In our training workshops and webinars, participants were trained in image analysis techniques using PlantCV and tools from our collaborators (e.g. hyperspectral image analysis and deep learning). In aggregate we directly worked with 134 participants that included undergraduate students, graduate students, postdoctoral researchers, and other academic and industry professionals. While the COVID pandemic disrupted some of our planned in-person activities, we were able to quickly adjust to an online format that allowed us to post recordings of live workshops and reach an additional 7,500 viewers over the full project period. Additionally, the training workshops and webinars provided an opportunity for early career researchers on our team to develop skills as instructors and for high quality user support. Early career team members affiliated with this project also had the opportunity to present their work through webinars, conference talks, and conference papers, resulting in 14 products. We have used the resources developed in this project along with follow-on funding to hire a full-time data science trainer to continue providing training workshops and teach courses at collaborating institutions (see above). How have the results been disseminated to communities of interest?Work on PlantCV over the life of the project directly contributed peer-reviewed publications. PlantCV has been used in a total of 70 peer-reviewed research publications and preprints since its initial release in 2015, these are research papers from people using PlantCV in their research. In addition to workshops organized by this project, a total of 26 talks have highlighted the use of PlantCV (research talks, symposium talks, conference talks). Frequent updates about PlantCV developments are announced through our social media accounts. Additionally, we post updates to the code base frequently on GitHub, which always has the most updated version of PlantCV. Users post questions and feature requests to the PlantCV team on GitHub, but we also use GitHub Issues to post public comments, questions, and discussions about potential directions we are thinking about, or changes we are planning, and open these issues for public input. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,450 followers. PlantCV software is available as source code through the GitHub platform where it is downloaded ~11 times per day on average. PlantCV is also available as an easy-to-install package on the Python Package Index and through Conda Forge where it is downloaded ~80 times per day on average. Over the full project period the PlantCV website (featuring our Twitter feed, pre-recorded seminars, webinars, and workshops about PlantCV, and publication lists) and documentation pages had tens of thousands of visitors from a worldwide audience and all 50 US states What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Overall, this project supported the development of two major versions of PlantCV image analysis software (v3 and v4), and PlantCV supported the plant phenotyping work of researchers resulting in at least 70 user publications. A summary of accomplishments under each specific project objective are below. Objective 1: The major aims of Objective 1 were to add support in PlantCV for machine learning methods and additional image data types, in particular high-dimensional datasets that contain spectral measurements beyond the typical color and grayscale image data types. This project enabled us to build in standardized support for three new data types: multispectral/hyperspectral, thermal, and chlorophyll fluorescence. Multispectral/hyperspectral imaging provides information for a wide range of discrete wavelengths both within and outside of the visible light range and can provide insights into the biochemistry of a plant or tissue without prohibitively costly metabolomic and biochemical profiling. Thermal imaging that can be used to measure leaf temperature and specialized imaging systems for measuring chlorophyll fluorescence that are used to measure photosynthetic parameters. For multispectral/hyperspectral imaging, we developed flexible tools that work with data from a variety of vendors/formats (e.g. Headwall, Cubert, Teledyne, Specim, and others). We developed a generic data structure for spectral image data that labels the image channels with the camera-detected wavelengths. PlantCV's spectral data structure supports vegetative index (e.g. NDVI) analysis by automatically assessing whether the required wavelengths are supported by the given spectral dataset. Additionally, multispectral data can be constructed by registering and fusing data from multiple cameras (e.g. RGB, Near-Infrared, or Thermal). From spectral datasets, PlantCV can measure spectral reflectance and vegetative indices in both summary statistics and full distributions for downstream analysis. New methods were added to PlantCV to support thermal infrared image data (the FLIR format is currently supported) that are used to measure plant leaf temperature. The PlantCV methods that work with thermal image data can convert the input images between temperature values and image-based values as needed for analysis, but ensure that the final results presented to the user are in temperature units. A new submodule was developed in PlantCV to support the analysis of chlorophyll fluorescence images from instruments that support pulse amplitude modulation measurement methods (e.g. Phenovation CropReporter, Waltz IMAGING-PAM). The photosynthesis submodule supports integrated analysis of the multiple imaging protocols supported by these instruments. Analyses of chlorophyll fluorescence that are supported by PlantCV include measuring the maximum quantum efficiency of photosystem II (PSII), PSII operating efficiency, and non-photochemical quenching. These photosynthetic parameters are measures of photosynthetic efficiency and plant stress. Over the course of the project we integrated several machine learning based approaches with PlantCV. First, we developed foundational tools for interactive image annotation directly in Jupyter notebooks that can be used in conjunction with PlantCV analysis methods to label image data for downstream machine learning tasks. Our point-based annotation tools were used to count and measure phenotypes of pollen grains and stomata from image data. Second, we integrated unsupervised machine learning approaches for color-based clustering, segmentation, and classification, including Gaussian Mixture Models and K-means clustering. Finally, we used a deep learning approach to segment individual plant leaves in image data and developed an approach to track individual leaves over time to trace their developmental trajectories. While these data and analyses were developed for plants grown in controlled environments, these tools are compatible with field-based phenotyping datasets and tools that will be developed in future work. Objective 2: The major aim of Objective 2 was to develop a framework for distributed, highly parallel, and platform-agnostic computing in PlantCV. Through this project, we made PlantCV easier to install by automating the process of distributing stable release packages on the Python Package Index (PyPI) and Conda Forge channel. PlantCV can now be installed with a single command. At the outset of the project, PlantCV did parallel image analysis on a single computer and was limited by local computing resources. Through this project, we replaced the PlantCV multiprocessing framework with a distributed system based on Dask that is a flexible approach to deploying PlantCV on multiple common types of distributed computing infrastructures, including HTCondor, LSF, PBS, SGE, SLURM, and others, while also maintaining local computing capabilities. Image analysis throughput can now be improved by distributing workloads across machines in a cluster rather than being limited to the resources available on a single machine. Parallel workflow analysis can now be configured using a configuration file (with a provided template) rather than complex command-line options. Minimal configuration differences are needed to deploy PlantCV on different infrastructures, making workflows more portable. Objective 3: The major aim of Objective 3 was to train potential users and developers of PlantCV directly through workshops and indirectly through robust documentation. This project directly supported interactive workshops at annual phenotyping conferences and through our local undergraduate summer internship program where participants were given training in plant image analysis using PlantCV and reached a total of 134 participants. Additionally, our live and recorded online workshops through the PhenomeForce group (https://phenome-force.github.io/PhenomeForce/) have been viewed over 7,500 times. Furthermore, the training materials developed for this project have been used as a foundation to develop a consistent training program at the Donald Danforth Plant Science Center and within the agtech innovation community in St. Louis, MO, and follow-on funding through the National Science Foundation allowed us to develop an undergraduate course in plant image analysis and data science at Harris Stowe State University (an HBCU in St. Louis, MO). Through this project we also developed robust online documentation for PlantCV (https://plantcv.readthedocs.io) and online interactive training materials. Using the Binder platform (https://mybinder.org/), we provide access to cloud-based, prebuilt environments with PlantCV tutorials where users can try PlantCV methods without installation. We organized our tutorials into a filterable gallery where users can quickly find example image analysis workflows that are similar to their own research question and use them as a foundation for building their own workflows.

Publications

Progress 08/01/22 to 07/31/23

Outputs
Target Audience:Our research program aims to develop an open-source, modular software toolbox (PlantCV: https://plantcv.org) to enable high-throughput plant phenotyping (measuring or monitoring plant physical, physiological, and morphological properties) work in research, breeding, and crop management activities. Our efforts to date have largely focused on the application of phenotyping technologies for research purposes, particularly in the context of controlled-environment (e.g. greenhouse and growth chambers) systems. But this work is a critical starting place to build robust, foundational tools and frameworks that will enable expansion to other application areas in the future. Additionally, the tools we are developing enable researchers to accelerate gene discovery and pre-breeding work by providing a robust framework for using multi-modal image data to non-destructively and quantitatively measure plant phenotypes at a high spatial and temporal resolution for large populations of plants. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,450 followers. PlantCV software is available as source code through the GitHub platform where it is downloaded ~11 times per day on average. PlantCV is also available as an easy to install package on the Python Package Index and through Conda Forge where it is downloaded ~80 times per day on average. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars, webinars, and workshops about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 29% of website visitors were from the US, but we also had worldwide reach. Finally, in the past project year, 23 publications/preprints that utilized PlantCV were published, in addition to 1 conference papers by PlantCV developers. Current and potential users of PlantCV software are a key target audience, and a main objective of this project aims to provide hands-on training opportunities. We have four virtual workshops through the PhenomeForce group (https://phenome-force.github.io/PhenomeForce/) and NAPPN 2021 annual meeting that have been viewed over 7,500 times. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?In the 2023 NSF REU program at the Donald Danforth Plant Science Center, we taught a PlantCV workshop to all of the students and several of 21 participants used PlantCV in their summer research projects. We also hosted a 5-day (2-3 hours per day) PlantCV workshop for Danforth Center and Harris Stowe State University (an HBCU in St. Louis, MO) participants (10-20 per day) to learn foundational image analysis techniques. We also hosted a full-day PlantCV workshop for 13 members of the commercial innovation community in St. Louis (small and large companies). In addition to the participants trained, the workshops were an opportunity for early career researchers on the instructor team, who helped to prepare and organize the training materials for the workshops and lead their own training modules. PlantCV has also been highlighted in several talks presented by early career researchers listed below: Acosta-Gamboa L, Czymmek K, Kenney S, Gordon J, Fahlgren N, Gutierrez J, Schuhl H, Gehan M. 2022. Chenopodium quinoa as a model plant to study salt stress. Presented at the North American Plant Phenotyping Network 2023 conference. Bhatt P, Kaggwa R, Bhatt PK, Leal SM, Podleski A, Callis-Duehl K, Gehan M, Fahlgren N. 2022. Preparing an Undergraduate Workforce at an HBCU for Career Opportunities in Plant Data Science through Immersive Classroom Experiences using PlantCV. Presented at the North American Plant Phenotyping Network 2023 conference. Ludwig E, Polydore S, Berry J, Sumner J, Haines K, Greenham K, Fahlgren N, Mockler TC, Gehan MA. 2022. Natural variation in Brachypodium distachyon responses to combined abiotic stresses. Presented at the North American Plant Phenotyping Network 2023 conference. Murphy K, Harmon C, Allen D, Gehan M. 2022. High-throughput microscopy image analysis of plant stomata. Presented at the North American Plant Phenotyping Network 2023 conference. Schuhl H, Peery JD, Gutierrez J, Gehan MA, Fahlgren N. Simplifying PlantCV workflows with multiple objects. Presented at the North American Plant Phenotyping Network 2023 conference. How have the results been disseminated to communities of interest?Work on PlantCV during this reporting period directly contributed peer-reviewed publications. PlantCV has been used in a total of 70 peer-reviewed research publications and preprints since its initial release in 2015, these are research papers from people using PlantCV in their research. In the past project year, 24 publications/preprints that utilized PlantCV were published. In addition to workshops organized by this project, a total of 13 talks have highlighted the use of PlantCV (research talks, symposium talks, conference talks). Frequent updates about PlantCV developments are announced through our social media accounts. Additionally, we post updates to the code base frequently on GitHub, which always has the most updated version of PlantCV. Users post questions and feature requests to the PlantCV team on GitHub, but we also use GitHub Issues to post public comments, questions, and discussions about potential directions we are thinking about, or changes we are planning, and open these issues for public input. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,450 followers. PlantCV software is available as source code through the GitHub platform where it is downloaded ~11 times per day on average. PlantCV is also available as an easy-to-install package on the Python Package Index and through Conda Forge where it is downloaded ~80 times per day on average. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars/webinars about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 29% of website visitors were from the US, but we also had worldwide reach. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? During the no-cost extension period, our major goal was to release PlantCV v4.0. The fourth major version of PlantCV is the culmination of work enabled by this project. Under Objective 1, PlantCV v4 expands the set of tools available for the new data types added during this project, including hyperspectral, thermal, and chlorophyll fluorescence imaging. Overall image analysis workflow development was simplified by reducing the numbers of input and output variables. The reduction in user coding complexity was achieved without removing functionality by moving common tasks that can be automated to background tasks, encapsulating data types that are always used together, and removing some input variables that in practice are never altered and where a default value can be used instead. Additionally, all PlantCV analysis methods now automatically iterate over multiple objects of interest, whereas previous versions required users to code their own coding loops. Overall, PlantCV image analysis workflows require much less Python coding. In addition to these improvements, data visualizations in PlantCV v4 are interactive and point/coordinate-based annotation is now included as a foundation for future image annotation tools that work directly in PlantCV analysis workflow notebooks. Built-in machine learning methods were improved for faster performance. Under Objective 2, workflow parallelization now supports image grouping for more complex workflows that can handle multiple input images, and supports multiple dataset layouts and metadata structures. Under Objective 3, the documentation was updated to match v4 functionality, and all example tutorials and templates were updated and modernized. Tutorials in the gallery in the documentation are now hosted in separate GitHub repositories so that they can be individually versioned and can be added from a variety of sources or authors.

Publications

Type: Journal Articles Status: Published Year Published: 2022 Citation: Beyene G, Chauhan RD, Villmer J, Husic N, Wang N, Gebre E, Girma D, Chanyalew S, Assefa K, Tabor G, Gehan M, McGrone M, Yang M, Lenderts B, Schwartz C, Gao H, Gordon-Kamm W, Taylor NJ, MacKenzie DJ. 2022. CRISPR/Cas9-mediated tetra-allelic mutation of the Green Revolution SEMIDWARF-1 (SD-1) gene confers lodging resistance in tef (Eragrostis tef). Plant Biotechnology Journal 20:17161729. DOI: 10.1111/pbi.13842.
Type: Journal Articles Status: Published Year Published: 2023 Citation: Panda K, Mohanasundaram B, Gutierrez J, McLain L, Castillo SE, Sheng H, Casto A, Gratac�s G, Chakrabarti A, Fahlgren N, Pandey S, Gehan MA, Slotkin RK. 2023. The plant response to high CO2 levels is heritable and orchestrated by DNA methylation. The New Phytologist 238:24272439. DOI: 10.1111/nph.18876.
Type: Journal Articles Status: Published Year Published: 2023 Citation: Ludwig E, Sumner J, Berry J, Polydore S, Ficor T, Agnew E, Haines K, Greenham K, Fahlgren N, Mockler TC, Gehan MA. 2023. Natural variation in Brachypodium distachyon responses to combined abiotic stresses. The Plant Journal: For Cell and Molecular Biology. DOI: 10.1111/tpj.16387.
Type: Journal Articles Status: Accepted Year Published: 2023 Citation: Murphy K, Ludwig E, Gutierrez Ortega JA, Gehan MA. 2023. Deep Learning in Plant Phenotyping. Annual Reviews in Plant Biology. Accepted.
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Schuhl H, David Peery J, Gutierrez J, Gehan MA, Fahlgren N. 2022. Simplifying PlantCV workflows with multiple objects. Authorea Preprints. DOI: 10.22541/au.166758437.76129704/v1.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: Gordon Research Conference, Salt and Water Stress, Switzerland.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: Illinois University, Edwardsville Seminar Series.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: Iowa State University Plant Breeding Symposium.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: Northeastern State University Seminar Series.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: Washington University in St. Louis, Seminar.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: Single-Cell Approaches Gordon Research Conference. Ventura, California.
Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Gehan, M.A. 2023. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: University of Minnesota, Seminar.

Progress 08/01/21 to 07/31/22

Outputs
Target Audience:Our research program aims to develop an open-source, modular software toolbox (PlantCV: https://plantcv.danforthcenter.org/) to enable high-throughput plant phenotyping (measuring or monitoring plant physical, physiological, and morphological properties) work in research, breeding, and crop management activities. Our efforts to date have largely focused on the application of phenotyping technologies for research purposes, particularly in the context of controlled-environment (e.g. greenhouse and growth chambers) systems. But this work is a critical starting place to build robust, foundational tools and frameworks that will enable expansion to other application areas in the future. Additionally, the tools we are developing enable researchers to accelerate gene discovery and pre-breeding work by providing a robust framework for using multi-modal image data to non-destructively and quantitatively measure plant phenotypes at a high spatial and temporal resolution for large populations of plants. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,400 followers. PlantCV software is available as source code through the GitHub platform where it was downloaded ~3,356 times (or ~11 times per day on average) over the past project year. PlantCV is also available as an easy to install package on the Python Package Index and through Conda Forge where it was downloaded over 100,000 times combined over the past project year. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars, webinars, and workshops about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 25% of website visitors were from the US (twice as high as the next highest country, India) and all 50 states (9 or more visitors from 50 states), but we also had worldwide reach. Finally, in the past project year, 14 publications/preprints that utilized PlantCV were published, in addition to 2 conference papers by PlantCV developers. Current and potential users of PlantCV software are a key target audience, and a main objective of this project aims to provide hands-on training opportunities. We held a half-day workshop (see accomplishments for more details) for PlantCV, through the 2022 North American Plant Phenotyping Network annual conference in February 2022. Our workshop had 11 participants, ~73% from university, non-profit, and government institutions, and ~27% from the private sector. Participants included early career researchers (3 postdoctoral researchers and 2 graduate students). The workshop was also a training opportunity for the instructor staff. In addition to the conference workshop, we have four virtual workshops through the PhenomeForce group (https://phenome-force.github.io/PhenomeForce/) and NAPPN 2021 annual meeting that have been viewed over 4,700 times. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Participants in our third annual phenotyping workshop at the 2022 NAPPN annual conference (in-person) were trained in image analysis using PlantV. The workshop directly trained 11 researchers but was also an opportunity for early-career researchers on the instructor team, who helped to prepare and organize the training materials for this in-person workshop. In the 2022 NSF REU program at the Donald Danforth Plant Science Center, we taught a PlantCV and bioinformatics workshop to all of the students, and 5 out of 21 participants used PlantCV in their summer research projects. PlantCV has also been highlighted in several talks presented by early career researchers listed below: Murphy K. 2022. Understanding heat stress susceptibility of a next-generation biofuel crop using high-throughput phenotyping. Presented at the Plant Biology 2022 conference. Polydore S, Fahlgren N. 2021. Phenotypic analysis of a European Camelina sativa diversity panel. Earth and Space Science Open Archive. DOI: 10.1002/essoar.10508336.2. Presented at the 2022 NAPPN annual conference. Gutierrez Ortega JA, Castillo SE, Gehan M, Fahlgren N. 2021. Segmentation of overlapping plants in multi-plant image time series. Earth and Space Science Open Archive. DOI: 10.1002/essoar.10508337.2. Presented at the 2022 NAPPN annual conference. Casto A, Schuhl H, Schneider D, Wheeler J, Gehan M, Fahlgren N. 2021. Analyzing chlorophyll fluorescence images in PlantCV. Earth and Space Science Open Archive. DOI: 10.1002/essoar.10508322.2. Presented at the 2022 NAPPN annual conference. How have the results been disseminated to communities of interest?Work on PlantCV during this reporting period directly contributed to a peer-reviewed publication. PlantCV has been used in a total of 46 peer-reviewed research publications and preprints since its initial release in 2015, these are research papers from people using PlantCV in their research. In the past project year, 14 publications/preprints that utilized PlantCV were published. In addition to workshops organized by this project, a total of 5 talks have highlighted the use of PlantCV (research talks, symposium talks, conference talks). Frequent updates about PlantCV developments are announced through our Twitter social media account. Additionally, we post updates to the code base frequently on GitHub, which always has the most updated version of PlantCV. Users post questions and feature requests to the PlantCV team on GitHub, but we also use GitHub Issues to post public comments, questions, and discussions about potential directions we are thinking about, or changes we are planning, and open these issues for public input. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,400 followers. PlantCV software is available as source code through the GitHub platform where it was downloaded ~3,356 times (or ~11 times per day on average) over the past project year. PlantCV is also available as an easy-to-install package on the Python Package Index and through Conda Forge where it was downloaded a combined over 100,000 times over the past project year. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars/webinars about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 25% of website visitors were from the US (twice as high as the next highest country, India) and all 50 states (9 or more visitors from 50 states), but we also had worldwide reach. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: Develop plant phenotyping analysis methods that utilize machine learning and high-dimensional datasets. Over the final project year we will release a new major version of PlantCV (v4) that contains some of the developments above. Several annotation tools described above are developed and available on GitHub but are not fully integrated with PlantCV, but we plan to include them in the first major release of the 4th version. Objective 2: Develop a highly scalable, platform-agnostic analysis workflow and data management system for plant image analysis. To further improve PlantCV usability, we plan to develop methods to allow users to run large-scale analyses directly from the Jupyter environment. Objective 3: Engage PlantCV users and developers from the public and private sector through hands-on workshops and by developing interactive documentation. We will organize an in-person hands-on PlantCV workshop at the 2023 NAPPN Annual Conference (Donald Danforth Plant Science Center), all materials for this workshop will be made available online. We also have plans to develop PlantCV analysis templates, which will complement tutorials. Templates will be reusable PlantCV workflows with minimal tunable parameters, designed to operate on specific dataset types, but are also readily adaptable to other datasets like a tutorial.

Impacts
What was accomplished under these goals? Objective 1: A major aim of Objective 1 is to add support in PlantCV for additional image data types, in particular high-dimensional datasets that contain spectral measurements beyond the typical color and grayscale image data types. For example, multispectral/hyperspectral imaging provides information for a wide range of discrete wavelengths both within and outside of the visible light range and can provide insights into the biochemistry of a plant or tissue without prohibitively costly metabolomic and biochemical profiling. Other examples include thermal imaging that can be used to measure leaf temperature and specialized imaging systems for measuring chlorophyll fluorescence that are used to measure photosynthetic parameters. In previous reporting periods, new submodules for working with hyperspectral, thermal, and chlorophyll fluorescence image data were added. In the current reporting period, the hyperspectral and photosynthesis (handling chlorophyll fluorescence images) submodules were improved to support a wider range of sensor vendor datasets, based on input from PlantCV users. The photosynthesis submodule now supports photosynthetic efficiency measurements from both Phenovation CropReporter and Waltz IMAGING-PAM systems (see Casto et al. 2021). The hyperspectral submodule was designed to work with hyperspectral data in the common geospatial ENVI standard. However, differences in ENVI implementation can lead to processing issues, and PlantCV now supports a wider range of ENVI data from vendors such as Headwall, Cubert, Teledyne, Specim, and others. While hyperspectral datasets contain deep information content, they are also large (typically in the 1-20 GB range currently) due to the detailed spectrographic information. When imaging multiple items (e.g. seeds, plants, leaves, etc.), analyses can focus on regions containing individual items one at a time. Additionally, hyperspectral image data contains high-density spectral information that can potentially be subsetted to focus analysis on the most informative reflectance wavelengths. To support data subsetting, a new feature was added to the hyperspectral submodule that allows a user to save image subregions or subsetted data cubes for downstream analysis. This allows users to use a multi-workflow strategy to analysis large hyperspectral datasets where a limited first workflow that runs on high-memory servers subsets the data and saves the smaller datasets to disk, followed by downstream workflows that can run on lower-resource servers and analyze the smaller datasets. Reducing the computational resource requirements reduces the cost of analysis and makes hyperspectral datasets more compatible with machine learning work. Imaging of plants over time also creates high-dimensional datasets. Analysis of time-series datasets with PlantCV has typically focused on each image as a separate computational unit and time-series analysis of observed phenotypes occurs downstream. However, time-series image stacks contain information that can be used to improve image analysis. For example, a common problem in high-density, multi-plant imaging over time is that segmentation of individual plants from the background fails once neighboring plants touch or overlap (plants that touch are treated as a single plant), but earlier within the time series the plants can be identified as separate. To access this information, we developed a new time-series image segmentation method in PlantCV that operates on the full time series as a three-dimensional (two spatial, one temporal) dataset (see Gutiérrez et al. 2021). The new method uses individually segmented plants at an early time point as markers for watershed segmentation where label propagation is only allowed to flow forward in time. This allows PlantCV to simultaneously segment individual plants across the time series and separate individuals that touch or overlap. Another major aim of Objective 1 is to develop improved methods for collecting image annotation data for training machine learning algorithms. PlantCV image analysis workflow development occurs in the Jupyter Notebook environment, so our ideal is that annotation work can also be done in Jupyter to avoid having to switch between different interfaces. To address this, in the current reporting period we developed a suite of interactive tools that operate in Jupyter and function as annotation tools. The base class of these annotation tools is an interactive point coordinate collecting method. The other tools build on the base class to form annotation tools for regions of interest (or bounding boxes) and for classifying objects. For the latter, we used an automated approach to segment objects (e.g. pollen) and then loaded the segmented objects into an annotation tool. This allowed users to correct mistakes in the automated segmentation and to split the objects into multiple categories (see Castillo et al. 2022). Objective 2: A major aim of Objective 2 is to develop a framework for distributed, highly parallel, and platform-agnostic computing in PlantCV. Users develop workflows in Jupyter Notebooks and then have to convert these notebooks to Python scripts for parallel analysis, which requires some coding expertise. In the current reporting period, we redesigned some features of the parallelization module to reduce or eliminate the coding requirements needed to do parallel analysis. We also improved the image dataset handling engine to add the ability for users to define image groupings that are passed to each parallel workflow. Image grouping is useful when a workflow utilizes more than one input image. For example, an entire time series (see Objective 1), or when combining multiple imaging modalities such as color (RGB) and near-infrared to calculate spectral indices such as NDVI (see previous reporting period). Objective 3: We developed PlantCV as a Python package to make it both a powerful tool for users with a background in bioinformatics and computer science and a relatively easy-to-use tool for plant biologists and other target audiences with less programming experience. We recognize that one of the pain points of getting started with PlantCV for people without experience in installing software of this nature is that getting everything set up correctly can be potentially frustrating before the user even knows if the tool works for their research question or interest. In the previous reporting periods we introduced interactive documentation that utilizes the cloud-based Binder (https://mybinder.org/) environment and a tutorial gallery that organizes our Binder-based example PlantCV workflows. These tutorials and environment allow users to quickly find examples similar to their own research question and allow them to try the analysis without installing software. Our aim in the current reporting period was to continue building up the tutorial resource by adding tutorials from PlantCV user publications. Current examples include tutorials for landmark point analysis (https://github.com/danforthcenter/plantcv-homology-tutorials), maize tassel analysis with machine learning (https://github.com/danforthcenter/plantcv-tasselyzer-tutorial), and interactive quinoa pollen annotation and analysis (https://github.com/danforthcenter/plantcv-tutorial-interactive-pollent-count). We were able to return to an in-person format for our annual hands-on workshop at the 2022 NAPPN annual conference that had 11 total participants. We also taught a workshop to 23 NSF REU students and other summer interns on using PlantCV and other bioinformatics resources (https://github.com/danforthcenter/nsf-reu-2022).

Publications

Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Gutierrez Ortega JA, Castillo SE, Gehan M, Fahlgren N. 2021. Segmentation of overlapping plants in multi-plant image time series. Earth and Space Science Open Archive. DOI: 10.1002/essoar.10508337.2.
Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Casto A, Schuhl H, Schneider D, Wheeler J, Gehan M, Fahlgren N. 2021. Analyzing chlorophyll fluorescence images in PlantCV. Earth and Space Science Open Archive. DOI: 10.1002/essoar.10508322.2.
Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Schuhl H, Gehan M, Fahlgren N. 2021. Workshop: Image analysis in Python with PlantCV. In: 2021 NAPPN Annual Conference. North American Plant Phenotyping Network. https://youtu.be/zvn_05cE0L4
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Gehan, M.A. 2022. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: 2022 North American Plant Phenotyping Network Annual Conference.
Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Gehan, M.A. 2022. Open Challenges in Plant Phenotyping with PlantCV. In: The FABI International Seminar Series.
Type: Journal Articles Status: Submitted Year Published: 2022 Citation: Castillo SE, Tovar JC, Shamin A, Gutirerrez J, Pearson P, Gehan MA. 2022. A protocol for Chenopodium quinoa pollen germination. Plant Methods 18:65. DOI: 10.1186/s13007-022-00900-3.

Progress 08/01/20 to 07/31/21

Outputs
Target Audience:Our research program aims to develop an open-source, modular software toolbox (PlantCV: https://plantcv.danforthcenter.org/) to enable high-throughput plant phenotyping (measuring or monitoring plant physical, physiological, and morphological properties) work in research, breeding, and crop management activities. Our efforts to date have largely focused on the application of phenotyping technologies for research purposes, particularly in the context of controlled-environment (e.g. greenhouse and growth chambers) systems. But this work is a critical starting place to build robust, foundational tools and frameworks that will enable expansion to other application areas in the future. Additionally, the tools we are developing enable researchers to accelerate gene discovery and pre-breeding work by providing a robust framework for using multi-modal image data to non-destructively and quantitatively measure plant phenotypes at a high spatial and temporal resolution for large populations of plants. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,313 followers. PlantCV software is available as source code through the GitHub platform where it was downloaded ~2,777 times (or ~10 times per day on average) over the past project year. PlantCV is also available as an easy to install package on the Python Package Index and through Conda Forge where it was downloaded a combined 30,000+ times over the past project year. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars/webinars about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 25% of website visitors were from the US (twice as high as the next highest country, India) and all 50 states (10 or more visitors from 50 states), but we also had worldwide reach. Finally, in the past project year, 11 publications/preprints that utilized PlantCV were published. Current and potential users of PlantCV software are a key target audience, and a main objective of this project aims to provide hands-on training opportunities. We held a half-day workshop (see accomplishments for more details) for PlantCV, through the 2021 North American Plant Phenotyping Network annual conference in February 2021. Our workshop had 45 participants (not including instructors), ~80% from university, non-profit, and government institutions, and ~20% from the private sector. Participants included early career researchers (6 postdoctoral researchers, 10 graduate students, and 1 high school student). The workshop was also a training opportunity for the instructor staff. In addition to the conference workshop, we also led three virtual workshops through the PhenomeForce group (https://phenome-force.github.io/PhenomeForce/). Together, our PhenomeForce workshops have been viewed over 2,800 times. Changes/Problems:With COVID-19, the PlantCV development team has been successfully working remotely. In 2021 we were unable to hold an in-person hands-on workshop but were able to hold a virtual workshop at the 2021 NAPPN Annual Conference. This hands-on virtual workshop had 45 participants, and all workshop materials are available online on GitHub (https://github.com/danforthcenter/plantcv-nappn2021-workshop). We have also held several virtual workshops on PlantCV and Raspberry PI imaging through the Phenome Force series. A potential advantage of hosting the workshops online is that we could likely reach many more participants. Despite this uncertainty with COVID-19, we do not anticipate any major issues with working on this project successfully. What opportunities for training and professional development has the project provided?Participants in our second annual phenotyping workshop at the 2021 NAPPN annual conference (virtual) were trained in image analysis using PlantV. The workshop directly trained 45 researchers but was also an opportunity for early-career researchers on the instructor team, who helped to prepare and organize the training materials for this virtual conference and other virtual workshops done throughout the year. In 2021, the Donald Danforth Center organized a completely virtual REU program, and 5 out of 13 participants used PlantCV in their summer research projects. In 2021, undergraduate researcher Ella Ludwig (co-PD Gehan as mentor) used PlantCV in her completely computational undergraduate thesis project, which was awarded a highly prestigious Spector award for best undergraduate biology thesis at Washington University in St. Louis. PlantCV has also been highlighted in several talks/workshops done by early career researchers listed below: Tovar J, Gehan M. 2021. Workshop: Measurement of plant phenotypes with low-cost Raspberry Pi computers and cameras. In: 2021 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=4mT7I6PFNa4&t=2511s Berry J, Fahlgren N. 2021. Workshop: Getting started with Raspberry Pi. In: 2021 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=WC0oXKY47aU&t=4114s Schuhl H, Gehan M, Fahlgren N. 2021. Workshop: Image analysis in Python with PlantCV. In: 2021 NAPPN Annual Conference. North American Plant Phenotyping Network. Schuhl H. 2021. Automated Leaf Angle Measurements in Grain Crops Using Skeleton Segmentation. In: 2021 NAPPN Annual Conference. North American Plant Phenotyping Network. Schuhl H, Gehan M, Fahlgren N. 2020. Workshop: An introduction to image analysis workflows with PlantCV. In: 2020 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=fVoPjvgT400 How have the results been disseminated to communities of interest?Work on PlantCV during this reporting period directly contributed to a peer-reviewed publication. PlantCV has been used in a total of 31 peer-reviewed research publications and preprints since its initial release in 2015, these are research papers from people using PlantCV in their research. In the past project year, 11 publications/preprints that utilized PlantCV were published. In addition to workshops organized by this project, a total of 10 talks have highlighted the use of PlantCV (research talks, symposium talks, conference talks). Frequent updates about PlantCV developments are announced through our Twitter social media account. Additionally, we post updates to the code base frequently on GitHub, which always has the most updated version of PlantCV. Users post questions and feature requests to the PlantCV team on GitHub, but we also use GitHub Issues to post public comments, questions, and discussions about potential directions we are thinking about, or changes we are planning, and open these issues for public input. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,313 followers. PlantCV software is available as source code through the GitHub platform where it was downloaded ~2,777 times (or ~10 times per day on average) over the past project year. PlantCV is also available as an easy-to-install package on the Python Package Index and through Conda Forge where it was downloaded a combined 30,000+ times over the past project year. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars/webinars about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 25% of website visitors were from the US (twice as high as the next highest country, India) and all 50 states (10 or more visitors from 50 states), but we also had worldwide reach. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: Develop plant phenotyping analysis methods that utilize machine learning and high-dimensional datasets. We have previously used the deep learning method Mask R-CNN for segmentation of individual leaves of arabidopsis plants so that the shape, size, color, and other properties of leaves can be measured. However, for a time-series-based image dataset, leaf segmentation is done per image and the identifiers for each leaf are not consistently applied over time, so measurements cannot be grouped by leaf. To address this issue, we plan on releasing a new method for linking segmented structures over time using an optimal assignment approach. While we are developing this method for linking measurements of arabidopsis leaves, we plan on developing a new submodule for more general time-series-based image analysis methods. A major bottleneck in utilizing machine learning and deep learning methods is the availability of annotated data for testing and validation of machine learning methods. One focus of PlantCV development over this next year is the development of semi-automated data annotation tools in PlantCV to better utilize machine learning and deep learning methods. Objective 2: Develop a highly scalable, platform-agnostic analysis workflow and data management system for plant image analysis. We have thoroughly tested the Dask-based distributed computing framework in PlantCV on our local cluster (HTCondor), but we plan to do further testing on other systems, including in cloud-based deployments. We also plan on improving other areas of the PlantCV metadata processor that works upstream of the parallelization system to enable more complex analyses that utilize groups of images rather than single image inputs. Objective 3: Engage PlantCV users and developers from the public and private sector through hands-on workshops and by developing interactive documentation. Current PlantCV documentation (static and interactive) includes tutorials that cover the major types of technologies that PlantCV supports. However, we find that potential users that investigate whether PlantCV is a useful tool for their work often have a particular use case in mind and can struggle to find a tutorial that matches well for their particular problem. We plan to expand the set of examples in the PlantCV tutorial gallery to include more use cases (e.g. measuring plant growth curves, analyzing a variety of plant species, etc.). We also plan to engage PlantCV users to work with them to host tutorials from their work, connected to publications when possible. We will also organize an in-person hands-on PlantCV workshop at the 2022 NAPPN Annual Conference (University of Georgia), all materials for this workshop will be made available online.

Impacts
What was accomplished under these goals? Objective 1: A major aim of Objective 1 is to add support in PlantCV for additional image data types, in particular high-dimensional datasets that contain spectral measurements beyond the typical color and grayscale image data types. For example, multispectral/hyperspectral imaging provides information for a wide range of discrete wavelengths both within and outside of the visible light range and can provide insights into the biochemistry of a plant or tissue without prohibitively costly metabolomic and biochemical profiling. Other examples include thermal imaging that can be used to measure leaf temperature and specialized imaging systems for measuring chlorophyll fluorescence that are used to measure photosynthetic parameters. In the previous reporting period, a new submodule for working with hyperspectral image data was added to PlantCV that utilizes a custom data structure for indexing and labeling the spectral information contained within each image. The PlantCV spectral data structure can be used to extract spectral indices, many of which are built into PlantCV. In the current reporting period, the hyperspectral analysis tools were updated to support multiple common data formats and now support raw data in both band interleaved by line (BIL) and band sequential (BSQ) data. A new visualization tool for inspecting hyperspectral data was created to plot the histogram of reflectance values for a set of wavelengths input by a user. While hyperspectral datasets contain deep information content, they are also large (typically in the 1-10 GB range currently) due to the detailed spectrographic information. Multispectral datasets that contain more targeted spectral information can be used to assay specific biological properties while keeping dataset sizes smaller. The PlantCV hyperspectral submodule is flexible and can work with any multispectral datasets where spectral information can be labeled. In the current reporting period, two new methods for building multispectral datasets were added to PlantCV. In multi-modal imaging systems, each camera may have a different resolution and field of view, making it difficult to analyze the data together as a single dataset. To address this challenge, we added methods to PlantCV to align images and adjust for differences in size and perspective and then fuse the resulting data into a multispectral dataset. For example, an imaging system with a color (RGB) camera and a near-infrared camera can now produce a simple multispectral dataset (RGB+NIR) and utilize existing PlantCV methods for calculating compatible spectral indices, such as Normalized Difference Vegetation Index (NDVI). The new photosynthesis submodule (see below) can also import multispectral datasets from systems that support spectral measurement protocols, and these datasets can also take advantage of the tools in PlantCV's hyperspectral submodule to calculate indices such as Anthocyanin Reflectance Index, Chlorophyll Index-Rededge, and other indices. A new submodule for methods related to the analysis of photosynthetic properties was created and supports imaging systems that measure pulse amplitude modulation chlorophyll fluorescence. These data are read from raw data sources and utilize n-dimensional labeled arrays to organize the chlorophyll fluorescence images into measurement and time series. Functions that work with these data can analyze and plot fluorescence induction curves and can measure photosynthetic properties such as the maximum quantum efficiency of photosystem II (PSII; Fv/Fm), PSII operating efficiency (Fq'/Fm'), and non-photochemical quenching (NPQ). Objective 2: A major aim of Objective 2 is to develop a framework for distributed, highly parallel, and platform-agnostic computing in PlantCV. In the current reporting period, the PlantCV multiprocessing method in the parallel submodule that only supported local parallelization was replaced with a redesigned framework that utilizes Dask (https://dask.org/) and Dask-Jobqueue (https://jobqueue.dask.org/) as a flexible approach for deploying PlantCV workflow analyses in parallel on multiple common types of distributed computing infrastructures, including HTCondor, LSF, PBS, SGE, SLURM, and others, while also maintaining local computing capabilities. Image analysis throughput can now be improved by distributing workloads across machines in a cluster rather than being limited to the resources available on a single machine. Parallel workflow analysis can now be configured using a configuration file (with a provided template) rather than complex command-line options. We also made improvements to the PlantCV measurement collection and reporting system. Previously, only a single set of measurements could be collected for a single image, which was limiting when there was a need to run an analysis method (e.g., shape, color, etc.) more than once (e.g., multiple plants, seed, etc.). To address this issue, we updated the PlantCV measurement collection system and analysis functions to support hierarchical measurement organization where measurements can be grouped by a sample identifier. For example, for images with multiple plants, measurements can be grouped by plant, which simplifies the construction of PlantCV analysis workflows. Objective 3: We developed PlantCV as a Python package to make it both a powerful tool for users with a background in bioinformatics and computer science and a relatively easy-to-use tool for plant biologists and other target audiences with less programming experience. We recognize that one of the pain points of getting started with PlantCV for people without experience in installing software of this nature is that getting everything set up correctly can be potentially frustrating before the user even knows if the tool works for their research question or interest. In the previous reporting period, we addressed this issue by developing interactive documentation resources that utilize the cloud-based Binder (https://mybinder.org/) environment. PlantCV notebooks available in Binder allow users to interactively use example PlantCV methods and workflows with sample data in an environment that is preconfigured with all the necessary software. While the modularity of PlantCV allows for flexibility for designing workflows for diverse image analysis problems, the availability of multiple approaches to building workflows can require a steep learning curve. In the current reporting period, we expanded the set of interactive documentation available in PlantCV to provide more examples of analyses of different types of biological samples (e.g., seeds, roots, leaf scans, etc.). To make it easier for users to find example tutorials that are relevant examples for their data, we reorganized PlantCV tutorials into a gallery that displays a virtual card for each tutorial. Each tutorial card displays an image that is representative of the tutorial, a descriptive title, keyword tags, and links to a static display of the tutorial and the interactive environment. The cards can be filtered dynamically by keyword to help users find relevant tutorials. Although the COVID-19 pandemic prevented us from holding an in-person workshop, we held our second annual hands-on workshop virtually at the 2021 NAPPN annual conference. The workshop had 45 participants and was recorded and available for asynchronous participation. All of the materials remain available through the Binder environment: https://github.com/danforthcenter/plantcv-nappn2021-workshop. Additionally, we led three additional virtual workshops through the PhenomeForce Fridays Hands-On Workshop Series in Fall 2020 and Spring 2021 (https://phenome-force.github.io/PhenomeForce/). These workshops provided hands-on, interactive walkthroughs of both image analysis with PlantCV and the setup and use of Raspberry Pi cameras for plant imaging for phenotyping.

Publications

Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Schuhl H, Gehan M, Fahlgren N. 2021. Workshop: Image analysis in Python with PlantCV. In: 2021 NAPPN Annual Conference. North American Plant Phenotyping Network.
Type: Journal Articles Status: Published Year Published: 2021 Citation: Casto AL, Schuhl H, Tovar JC, Wang Q, Bart RS, Fahlgren N, Gehan MA. 2021. Picturing the future of food. The Plant Phenome Journal 4. DOI: 10.1002/ppj2.20014.
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Fahlgren N. 2021. Time series-based analysis of Raspberry Pi image stacks. In: IPPN Affordable Phenotyping Workshop: Plant Phenotyping with Minicomputers and Low-Cost Cameras.
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Tovar J, Gehan M. 2021. Workshop: Measurement of plant phenotypes with low-cost Raspberry Pi computers and cameras. In: 2021 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=4mT7I6PFNa4&t=2511s
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Berry J, Fahlgren N. 2021. Workshop: Getting started with Raspberry Pi. In: 2021 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=WC0oXKY47aU&t=4114s
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Gehan, M.A. 2021. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: UC Davis Annual Symposium
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Gehan, M.A. 2021. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: Michigan State University Annual Symposium
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Gehan, M.A. 2021. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: Washington State University Seminar Series
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Schuhl H. 2021. Automated Leaf Angle Measurements in Grain Crops Using Skeleton Segmentation. In: 2021 NAPPN Annual Conference. North American Plant Phenotyping Network.
Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Gehan, M.A. 2021. What Phenotypes Matter? Open Challenges in Plant Phenomics with PlantCV. In: North American Plant Phenotyping Network Annual Conference.
Type: Conference Papers and Presentations Status: Other Year Published: 2020 Citation: Schuhl H, Gehan M, Fahlgren N. 2020. Workshop: An introduction to image analysis workflows with PlantCV. In: 2020 Fridays Hands-On Workshop Series. PhenomeForce. https://www.youtube.com/watch?v=fVoPjvgT400

Progress 08/01/19 to 07/31/20

Outputs
Target Audience:Our research program aims to develop an open-source, modular software toolbox (PlantCV: https://plantcv.danforthcenter.org/) to enable high-throughput plant phenotyping (measuring or monitoring plant physical, physiological, and morphological properties) work in research, breeding, and crop management activities. Our efforts to date have largely focused on the application of phenotyping technologies for research purposes, particularly in the context of controlled-environment (e.g. greenhouse and growth chambers) systems. But this work is a critical starting place to build robust, foundational tools and frameworks that will enable expansion to other application areas in the future. Additionally, the tools we are developing enable researchers to accelerate gene discovery and pre-breeding work by providing a robust framework for the use of multi-modal image data to non-destructively and quantitatively measure plant phenotypes at a high spatial and temporal resolution for large populations of plants. To measure the reach and impact of PlantCV to our target audiences, we monitor several usage statistics for the PlantCV software package and web-based resources. We manage a Twitter account (@plantcv) to disseminate general information and updates about PlantCV software to the community and currently reach more than 1,060 followers. PlantCV software is available as source code through the GitHub platform where it was downloaded ~1,500 times (or ~4 times per day on average) over the past project year. PlantCV is also available as an easy to install package on the Python Package Index and through Bioconda where it was downloaded a combined 30,000+ times over the past project year. Over the past project year, the PlantCV website (featuring our Twitter feed, pre-recorded seminars/webinars about PlantCV, and publication lists) and documentation pages had thousands of visitors. About 30% of website visitors were from the US (twice as high as the next highest country, India) and all 50 states (10 or more visitors from 46 states), but we also had worldwide reach. Finally, in the past project year, seven publications/preprints that utilized PlantCV were published. Current and potential users of PlantCV software are a key target audience, and a major objective of this project aims to provide hands-on training opportunities. Through the Phenome 2020 conference in February 2020, we held a full-day workshop (see accomplishments for more details) for PlantCV and other image analysis techniques. Our workshop had 42 participants (not including instructors), 75% from university, non-profit, and government institutions, and 25% from the private sector. Participants included early career researchers (6 postdoctoral researchers and 3 PhD students). The workshop was also a training opportunity for the instructor staff, which was also represented by a range of career stages (3 faculty, 1 PhD student, 3 postdoctoral researchers, and 2 research staff). Changes/Problems:Fortunately, we were able to hold our first annual hands-on workshop at the Phenome 2020 conference before the major shutdowns and limitations on travel induced by the COVID-19 pandemic. At the Phenome 2020 conference, it was announced that there would not be a Phenome 2021 conference organized by the American Society for Plant Biologists. Therefore, for at least the second project year we will need to find a new home for the annual workshop. Potential options include a replacement conference for Phenome organized by the North American Plant Phenotyping Network (NAPPN), of which co-PD Gehan was recently elected to the executive board, or one of several other plant science or agriculture-focused conferences. However, under the current COVID-19 situation in the US, it is unlikely that there will be any in-person scientific meetings in the first half of the next reporting period. While we are hopeful that COVID-19 will be under control in 2021, we will plan for a contingency where we need to hold one or more alternative workshops virtually. We have experience with teaching PlantCV remotely and will be participating in a phenomics workshop series this Fall that will give us a good insight into how well we can translate our workshop to an online experience. A potential advantage of hosting the workshops online is that we could likely reach many more participants. Despite this uncertainty, we do not anticipate that this will be a major issue overall and that we have the resources in place to host the workshop(s) in any number of formats successfully. What opportunities for training and professional development has the project provided?Participants of our first annual phenotyping workshop were trained in image analysis techniques using PlantCV, sorghum panicle counting from aerial imagery using deep learning, and hyperspectral image analysis (see Target Audience for participant information). The workshop was also a training opportunity for early career researchers on the instructor team who had the opportunity to help prepare and organize training materials for a hands-on workshop using cloud-based computing resources and real-time troubleshooting with participants. How have the results been disseminated to communities of interest?Work on PlantCV during this reporting period contributed to a peer-reviewed publication. In addition to the workshop organized by this project, a project team member was also invited to present her work on PlantCV at the Phenome 2020 conference, which was attended by members of the plant phenotyping community in North America and included members of the international community through the International Plant Phenotyping Network (IPPN). PDs Gehan and Fahlgren remotely taught a hands-on class on image analysis using PlantCV to a group of PhD students at the University of Missouri-Columbia. PD Fahlgren presented a webinar to 91 participants on PlantCV as part of the CyVerse Focus Forum Webinars series. Frequent updates about PlantCV developments are announced through our Twitter social media account. Additionally, we post updates to the code base frequently on GitHub, which always has the most updated version of PlantCV. Users post questions and feature requests to the PlantCV team on GitHub, but we also use GitHub Issues to post public comments, questions, and discussions about potential directions we are thinking about, or changes we are planning, and open these issues for public input. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: The foundational infrastructure for working with hyperspectral data has been added to PlantCV. Over the next year, we will expand the capabilities for analyzing hyperspectral data by connecting PlantCV with tools developed by our collaborator, Alina Zare. The Zare lab has a suite of tools for classification (finding regions of an image that match spectral patterns), spectral unmixing (deconvoluting spectral data in an image to predict what the component materials in view were), and endmember detection (determining what the spectral patterns of individual materials are in an image with mixing). We have worked with the Zare lab to package their tools in the Python Package Index and Anaconda Cloud systems and will work to utilize these tools in PlantCV. We plan to expand our work with deep learning tools for instance segmentation (detecting and outlining individual objects in an image). We have demonstrated that instance segmentation works well for arabidopsis leaves and that we can utilize these data to link objects over time, but we aim to test and refine these tools for work in other plant species that will present unique challenges compared to arabidopsis. We also plan to explore the use of image-to-image variation created by plant movement over time as a tool to isolate plants (or plant parts) that overlap, which is a challenge in image analysis using a 2D field of view. Objective 2: During the current reporting period we investigated a variety of workflow systems for potential integration with PlantCV to replace our existing parallel processing system with a system that is agnostic to computational infrastructure. Of those that we investigated, Dask (https://dask.org/) is one that accomplishes our goals while also being easier to integrate with PlantCV because it is a Python-based package. During the next reporting period, we plan to replace the current parallelization system in PlantCV with Dask so that PlantCV will be capable of distributed computing on multiple infrastructures with minimal user configuration. Objective 3: Current PlantCV documentation (static and interactive) includes 12 tutorials that cover the major types of technologies that PlantCV supports. However, we find that potential users that investigate whether PlantCV is a useful tool for their work often have a particular use-case in mind and can struggle to find a tutorial that matches well for their particular problem. We plan to reorganize and expand the PlantCV tutorial resource into a gallery where each tutorial is based on a use-case (e.g. measuring plant growth curves, counting the number of seeds, etc.) rather than a technology (RGB, hyperspectral, thermal, etc. imaging). A representative thumbnail, a title, and keywords will be browsable on the gallery page, and each tutorial will link out to an interactive Jupyter Notebook/Binder session (or static notebook if preferred).

Impacts
What was accomplished under these goals? Objective 1: Hyperspectral imaging is an emerging technique for plant phenotyping. Unlike RGB cameras that aggregate visible light (~400-700nm) in the red, green, and blue wavelengths, hyperspectral cameras provide information for a wide range of discrete wavelengths that extend outside the visible spectrum (>700nm). Like RGB cameras, hyperspectral cameras capture reflectance data in a spatial context, allowing reflectance over different parts of the plant to be observed. The light reflecting from plants is complex and dependent on numerous biophysical, biochemical, and atmospheric interactions. The visible range of light is mostly influenced by leaf pigments, while near-infrared reflectance is dependent on leaf structure and biochemical composition, including water content. Hyperspectral profiles can provide insights into the biochemistry of a plant or tissue without prohibitively costly metabolomic and biochemical profiling. Because hyperspectral cameras capture detailed spectrographic information, resulting data files are large (typically ~1-10 GB currently) and cannot be visualized with typical software that is used for image files. During the current reporting period, a new submodule for working with hyperspectral image data was added to PlantCV. The hyperspectral submodule can read data from binary raw data files (band-interlaced or sequential) and corresponding metadata files in the ENVI (Environment for Visualizing Images) format currently. In Python, images are typically represented as NumPy n-dimensional arrays (two spatial and one spectral dimensions), which is a common data format that allows image data to be analyzed using the large set of scientific tools available within the language. However, unlike typical RGB or grayscale data, the interpretation of hyperspectral data requires knowledge of the wavelengths corresponding to each of the spectral channels in the image file. To address this need, we developed a new data structure for working with hyperspectral data in PlantCV. PlantCV hyperspectral data objects still work with the data in NumPy n-dimensional array format, but around this core data structure we incorporate important attributes of the hyperspectral data cube, including the wavelengths associated with each spectral channel. The PlantCV hyperspectral data object structure allows the user to retrieve channels associated with specific wavelengths on-demand. Additionally, we added new analysis methods to PlantCV that utilize the new hyperspectral data objects. These methods include a calibration function that converts raw hyperspectral intensities to reflectance values, an analysis method that measures the average spectral reflectance curve of an object of interest (e.g. plant, leaf, seed, etc.), and a suite of built-in vegetative indices. Vegetative indices (e.g. NDVI) are common tools for reducing the high dimensionality of hyperspectral data. The built-in vegetative indices assess whether the user-provided hyperspectral data include the wavelengths needed to calculate the given index and return the computed 2D array if applicable. In addition to hyperspectral data, new methods were added during the current review period to read and analyze thermal infrared image data (the FLIR format is currently supported). The PlantCV methods that work with thermal image data can convert the input images between temperature values and image-based values as needed for analysis, but ensure that the final results presented to the user are in temperature units. During the reporting period, work was done with machine learning methods in two areas. First, a new method to segment images into discrete classes based on color using Gaussian Mixture Models (GMM) was developed (currently under review for the next PlantCV release). The new unsupervised GMM-based classification method does not require users to label training data and is complementary to our existing naive Bayes classification method, which requires user-defined training data. Second, we continued our work with Mask R-CNN, an instance segmentation-based deep learning method for detecting and segmenting individual objects (e.g. leaves) in an image. We utilized images of synthetic arabidopsis plants that were computationally generated with corresponding labels for each leaf (Ward et al. 2018; https://arxiv.org/abs/1807.10931) to retrain a Mask R-CNN framework. The resulting model can identify and separate leaves in arabidopsis image data we collected in-house with high accuracy, even though our data was not part of the training set, suggesting that synthetic image data is a potentially useful route to quickly generate training data with less hands-on user input. Using these individual leaf predictions, we developed a PlantCV framework to link individually identified leaves between time points in a time series, allowing users to track leaf-level measurements (e.g. leaf area) over time (currently in code review on GitHub). Objective 2: To make setup of PlantCV easier, and to better integrate with other available toolkits, we made PlantCV packages available through the Python Package Index (PyPI; https://pypi.org/project/plantcv/) and Bioconda (https://anaconda.org/bioconda/plantcv), which makes it possible to install PlantCV with a single command. To enable the use of PlantCV on public infrastructure, we configured PlantCV with an encapsulated environment for use on the CyVerse platform through the Visual and Interactive Computing Environment. Objective 3: We developed PlantCV as a Python package to make it both a powerful tool for users with a background in bioinformatics and computer science and a relatively easy to use tool for plant biologists and other target audiences with less programming experience. We recognize that one of the pain points of getting started with PlantCV for people without experience in installing software of this nature is that getting everything set up correctly can be potentially frustrating before the user even knows if the tool works for their research question or interest. To address this issue, we developed a new set of online, interactive documentation to complement our robust set of static documentation. The interactive documentation environment utilizes the cloud-based Binder platform to encapsulate a PlantCV environment that includes our software and all the packages it depends on and includes Jupyter Notebook versions of the PlantCV documentation and all of the sample data used throughout it. A user can now navigate with their web browser to an interactive version of most of the PlantCV documentation and have immediate access to a functioning environment with sample data without the need for any setup. To engage with users directly, we held our first annual hands-on workshop at the Phenome 2020 conference. The all-day Digital Phenotyping Workshop hosted 42 participants and was divided into three sessions. For all sessions, we set up cloud-based Jupyter Notebook systems for each participant so that installing software and configuring local computers was not a barrier. In Session 1 we did a full walkthrough with participants on using PlantCV from start to end. We worked with a dataset of images of virus-infected plants to develop an analysis workflow for measuring infection severity, deploy that workflow in parallel for the full dataset, and analyze the resulting measurements. In Session 2, our collaborator Jordan Ubbens (PhD student, U. of Saskatchewan) provided a hands-on walkthrough of counting sorghum panicles from aerial imagery using a deep learning technique. In Session 3, our collaborators Alina Zare (Professor, U. of Florida) and Susan Meerdink (Assistant Professor, U. of Iowa) provided a hands-on walkthrough of hyperspectral image analysis. All workshop materials, including instructions for how to set up environments similar to what we provided for the workshop are available: https://github.com/danforthcenter/phenome2020-workshop

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: Tovar JC, Quillatupa C, Callen ST, Castillo SE, Pearson P, Shamin A, Schuhl H, Fahlgren N, Gehan MA. 2020. Heating quinoa shoots results in yield loss by inhibiting fruit production and delaying maturity. The Plant Journal: For Cell and Molecular Biology. DOI: 10.1111/tpj.14699.
Type: Conference Papers and Presentations Status: Other Year Published: 2020 Citation: Schuhl H. 2020. PlantCV- Open Source Plant Phenotyping Software. In: Phenome 2020. American Society of Plant Biologists.
Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Fahlgren N. 2020. Webinar: PlantCV: A Modular Image Analysis Toolkit for Building Plant Phenotyping Workflows. In: Focus Forum Webinars. CyVerse. https://cyverse.org/webinar-plantcv-image-analysis-toolkit-for-building-plant-phenotyping-workflows.
Type: Journal Articles Status: Submitted Year Published: 2020 Citation: Casto A, Schuhl H, Tovar JC, Wang Q, Bart RS, Fahlgren N, Gehan MA. (2020). Picturing the Future of Food. Submitted to The Plant Phenome Journal.
Type: Conference Papers and Presentations Status: Other Year Published: 2020 Citation: Gehan M. 2020. Differential Heat Stress Response in Quinoa Roots and Shoots. In: International Quinoa Research Symposium 2020. Washington State University.
Type: Conference Papers and Presentations Status: Other Year Published: 2019 Citation: Gehan M. 2019. Webinar: Democratizing High-Throughput Phenotyping with PlantCV. In: EU-US Big Data Joint Webinars on Digital Agriculture. International Plant Phenotyping Network.
Type: Journal Articles Status: Other Year Published: 2019 Citation: Gehan M. 2019. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: McGill University Seminar Series. McGill University.
Type: Conference Papers and Presentations Status: Other Year Published: 2019 Citation: Gehan M. 2019. Utilizing Natural Variation and High-Throughput Phenotyping for Crop Improvement. In: Plant Evolutionary Genomics Conference. Sitges, Spain (Keynote).
Type: Conference Papers and Presentations Status: Other Year Published: 2019 Citation: Gehan M. 2019. Utilizing Natural Variation for Crop Resilience Under Temperature Stress. In: Phenomics Session: ASPB Annual Meeting (Session Chair).