BTT EAGER: Unified Big Data in Genomics and Phenomics for Plant Breeding

BTT EAGER: UNIFIED BIG DATA IN GENOMICS AND PHENOMICS FOR PLANT BREEDING

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1018267

Grant No.

2019-67013-29008

Cumulative Award Amt.

$300,000.00

Proposal No.

2018-09048

Multistate No.

(N/A)

Project Start Date

Dec 1, 2018

Project End Date

Nov 30, 2021

Grant Year

2019

Program Code

[A5173]- Early Concept Grants for Exploratory Research (EAGERs) to Develop Breakthrough Ideas and Enabling Technologies to Advance Crop Breeding and Functional Genomics

Recipient Organization
KANSAS STATE UNIV
(N/A)
MANHATTAN,KS 66506

Performing Department
Plant Pathology

Non Technical Summary
Pairing phenotype data with genomic data has the potential to revolutionize plant breeding. Currently, plant breeding seeks to identify key combinations of genes that produce optimal traits such as grain yield. The plant breeding process for many crops is long and tedious, with thousands of plants advanced several generations only to be discarded at intial evaluations when there is sufficient seed to conduct test plots. Phenomics is an emerging field that collects measureable trait data, such as plant height and color, that can be paired with genomic data to select superior plants. The optimal integration of these two fields have not been fully explored, and one overlooked area has been in field-based single plant phenotyping within early generations. High-throughput evaluation of single plants that enables breeders to select as well as advance populations during early generations, could increase the rate of genetic gain and resource efficiency within plant breeding by allowing plant selections to occur two or more years earlier than in traditional breeding. This project will focus on advancing the field of high-throughput phenotyping to the scale of single plants using unmanned aerial vehicles (UAVs) equipped with high-resolution sensors. Using digital imagery throughout the growing season, data from single plants will be collected and evaluated. Methods to evaluate single plants from digital imagery will be developed in open-source pipelines and repositories allowing for real-time data sharing and evaluation. The initial population evaluated will be an association mapping panel of publically available lines, creating an artificial segregating population to refine single plant analysis methods. Along with evaluating the association mapping panel, segregating lines from a breeding program will also be assessed. A key component of this research will be open source nature of the project and the expected deliverables of documented research guides for researchers to use within their own programs. The methods developed will also be relevant for researchers that are screening populations looking for key phenotype changes that may come from populations, such as mutation breeding, double haploid, or genetic engineering, where there is insufficient seed to evaluate full size test plots. This project will aid researchers in developing higher yielding plants to provide for a growing world population in a more sustainable manner.

Animal Health Component

34%

Research Effort Categories

Basic

33%

Applied

34%

Developmental

33%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1540	1081	100%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1540 - Hard red winter wheat;

Field Of Science
1081 - Breeding;

Keywords

single plant

high-throughput phenotyping

early-generation testing

wheat

genomics and phenomics

Goals / Objectives
The major goals of this project are to extend existing small plot phenotyping methods to single plant field based analysis. Developing methods to monitor single, segregating plants allows for the potential of increased accuracy and earlier selections in breeding populations. For many crops selecting single plants in early generations could be performed two or more years before selection would be typically undertaken. This could allow for substantial gains to be made in crop breeding as well as increased resouce efficiency. Specifically, this project aims to:1. Adapt and extend high-throughput phenotyping (HTP) methods to the scale of single plants from full plot or field size.2. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories.3. Identify key traits that can be used to predict plant yield, such as growth rate and plant yield.4. Combine genomic and phenomic data to unravel the genotype to phenotype (G2P) problem. Genomic techniques including quantitative trait loci (QTL) mapping, association mapping, and genomic selection will be applied to better predict plant behavior and growth.5. Create and distribute open source user tutorials to enhance scientists' knowledge and enable high-throughput phenotyping in their respective programs.

Project Methods
This project seeks to develop methodologies, tools, and pipelines to integrate genomics and field-based phenomics to single plant analysis. Our approach aims to revolutionize plant breeding and genomics studies with single plant field-based phenotyping that will enable multiple rounds of selection to increase the number of desirable traits at all stages of the plant breeding program from crossing to yield trials. The specific aims are:1. Adapt and extend HTP methods to the scale of single plants. We aim to a) extend current phenotyping methods to single plant level and overcome challenges of working with single plants. b) apply our methods to segregating plant material and develop an artificial population for reproducibility. a. We will use UAVs equipped with state-of-the-art high resolution cameras (6K, 30p imaging) to collect high temporal and spatial resolution image data. Ground control points will be used for georeferencing space planted single plants allowing image data to be accurately assigned to each plant, thus expanding phenotyping techniques from broad plot application to the specificity of each single plant. Along with georeferencing the data, another area to extend current phenotyping methods to single plants is the plant spacing. Our approach will start with space planted plants using precision single seed vacuum planter. This planting is currently implemented in breeding programs in early generations, allowing single space planted plants to be identified and selected. As our methods are refined as well as computing technology, the distance between plants may be adjusted to eventually merge single plant analysis with full plot research, allowing researchers to monitor single plants that are grown in an enclosed canopy. b. Within the breeding program, we will evaluate segregating material for ideotype by precisely identifying plants with GPS coordinates, and use image data collected along with breeder selection to develop models to select superior plants. While we will apply our methods to breeding populations, there are distinct challenges to phenotyping segregating populations that cannot be replicated or repeated. While this will provide a real-world test to our data pipelines and collection methods, in order to benchmark our methods, and determine if they can be applied to the breeding program, we will begin by looking at threshold traits such as disease resistance or plant height. In order to validate our methods in segregating populations we will use genotyping-by-sequencing to genotype segregating material in the second year of the project and compare our data to breeder selected plants. This data will be the first proof of concept of phenotyping within a segregating population and provide the basis for continued single plant analysis for more complex traits. We will also use a reference association mapping (AM) panel consisting of 300 lines representing nearly 100 years of breeding progress for reproducibility. These lines represent a large portion of the diversity found within public breeding programs and span a range for numerous morphological traits including plant height, flowering time, and grain yield. Using these fixed lines, which represent a segregating population, the ability to reproduce the same genetic lines each year will be preserved, allowing us to verify our methods across years in stable populations. We will randomly plant single plant genotypes in complete replicates of the AM panel each year for data collection including plant yield. In addition, full plots using the AM panel will also be planted allowing a comparison of single plant traits to full plot phenotypes. The AM panel has been previously genotyped and will enable testing for genetic associations of the single plant phenotypes.2. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories. Utilizing collected HTP data, we will develop automated pipelines to efficiently process raw images to phenotypic trait data that can be used for analysis. We will use open source and publically available repositories to disseminate HTP single plant methodology in real time. Once images are collected on the camera the data will be processed to record pertinent information about each image. Automated pipelines that can perform these operations quickly and efficiently will be used similar to published methods. These pipelines will be used to add metadata to the photos as well as store metadata in databases. By storing metadata that includes position, time, and hardware information, within open source databases that can be easily queried by users, scientist can identify data that may be useful to them without having to perform repeated experiments. Data will be collected from the lines throughout the season from emergence to physiological maturity and grain yield. All data will be uploaded to online databases that can be queried.3. Identify key traits that can be used to predict plant yield, such as growth rate and plant yield. This work will aim to identify traits that can be detected in single plants that are correlated to grain yield. Many traits that have been amenable to UAV applications and plant breeding including spectral reflectance to calculate the normalized difference vegetation index and canopy temperature. Utilizing phenotypic data collected throughout the growing cycle, we will investigate relationships between extracted phenotypic traits and ground truth data such as plant grain yield to full plot yield using the AM panel.4. Create and distribute open source user tutorials to enhance scientists' knowledge and encourage them to replicate the process in their respective disciplines. As analysis pipelines are developed, these will be distributed as tutorials to increase capacity for high throughput phenotyping. By making both the data and analysis pipelines publically available scientist from around the globe can utilize these data and methods to further their research as well as duplicate and improve on the methods for their particular crops. Our goal will be to develop automated pipelines that are capable of extracting phenotypic traits from image data. Currently, many methods to extract traits require significant user involvement limiting the amount of information that can be processed, thus automated pipelines will be a substantial addition to phenotyping. Additionally, this should increase the rate of knowledge discovery by allowing a wider community of scientist to examine, mine, and apply techniques of their disciplines to this data.5. Combine genomic and phenomic data to unravel the genotype to phenotype (G2P) problem. Genomic techniques including quantitative trait loci (QTL) mapping, association mapping, and genomic selection will be applied to better predict plant behavior and growth. We will focus on combining genetic and phenotypic information using single plant traits that have been verified to correlate to full plot yield. We will identify QTL using genome wide association analysis as well as predict trait performance using genomic selection. The ultimate goal of this project will be testing the integration of both phenomics and genomics for prediction and selection on single plats. With concurrent paired full sized yield plots, we will test the predictions from single plants to replicated yield testing as the current standard for advanced testing in breeding programs. While there have been some applications of previous research in these areas, they have not fully integrated methods within plant breeding programs, and our ultimate objective is to provide proof of concept as well as application within breeding programs.

Progress 12/01/18 to 09/21/21

Outputs
Target Audience:The target audience is plant breeders and scientist working on any plant species that need improvement. We anticipate these methods being useable for commercial crops as well as minor and orphan crops that are often resource limited. Additionally, we believe this research will be useful for all stages of the plant breeding program from initial crossing and inbreeding to final line and cultivar evaluation. Changes/Problems:The most significant problem has been the ongoing COVID-19 pandemic and the repercussions for research. While this has slowed analysis and delayed the implementation of internships; however, all original grant objectives have been achieved. What opportunities for training and professional development has the project provided?This award has provided numerous opportunities for professional development and trainings. As a post-doc, PI Crain has had the opportunity to plan, manage, and facilitate scientific research which has provided skills that will be useful as PI Crain transitions to a research assistant professor role. Post-docs Crain and Wu have also been able to attend professional conferences to develop stronger scientific skills. A significant mentoring experience was provided to two undergraduate scholars during the 2021 summer, where the undergraduate interns were introduced to high-throughput phenotyping, next-generation sequencing, data analysis and scientific literacy. After completing a tutorial based on data collected from this project, each student evaluated a separate project culminating in developing a scientific research poster. From self-reported assessments, their greatest accomplishments throughout the internship were learning to code in R. How have the results been disseminated to communities of interest?In addition to presenting project results within laboratory and department meetings, a strong emphasis has been placed on peer-reviewed published results. This award has resulted in one book chapter and one anticipated manuscript that will reach a wide variety of plant researchers and scholars. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Plant breeding requires adequately phenotyping--identifying differences between plants and selecting for superior traits--between thousands of potential plants. This project worked toward developing quantitative and digital tools that can be applied to the life cycle of plant breeding from crossing plants to final evaluation of lines or cultivars. Specifically, we extended full plot high-throughput phenotyping that is available in later stages of the breeding cycle to single plants that represent material in the first few years of the breeding cycle that is either heterozygous or unable to produce sufficient seed for full plot replications. By providing plant breeders with a viable option to quantitatively select each generation significant genetic gains could be achieved within the plant breeding cycle. This research and results will be of interest to plant breeders for a wide range of plants including both annual and perennial plants that represent both self-pollinated and cross-pollinated species. This project has: Adapted high-throughput (HTP) phenotyping methodology from full plots to single plants. During two years, a diverse panel of 339 wheat lines was grown and phenotyped under multiple field replications of full plot and single plants. Both planting patterns were phenotyped using unmanned aerial vehicles (UAVs) to collect color and near infrared imagery 5-10 times throughout the growing season. In addition to HTP data collection, grain yield, plant height, thousand kernel weight, spike length, and spike number were recorded for each plot. This provided a method to compare single plant trait expression to full plot (more representative of farmer growing conditions) to determine if single plant trait expression was correlated to full plot trait expression. Developed open-source pipelines that extract biological relevant traits from images. We have extended previously existing protocols for extracting HTP information from full plots for single plants. The original methods have been published in peer reviewed articles, and our modifications will be made available through an upcoming peer-reviewed publication and open-source repositories (Products and Other Products). We anticipate that these products will be useful for plant breeders interested in implementing HTP in their own research projects. Identified key traits that can be used to predict plant yield. Comparing collected data from the single plants--which are representative of segregating, early-generation material in a breeding program--to full plots showed that several traits could be measured on single plants and would have stable trait expression in full plots. This suggest that selection could be performed in single plants, generations or years earlier than selection in possible in full yield plots. During the growing season single plant normalized difference vegetation index (NDVI) values collected from aerial imagery was correlated to full plot yield. Full plot NDVI was positively correlated to full plot grain yield both years, indicating that selection strategies could be developed to select superior plants. This should directly increase the rate of genetic gain and the quality of plants that can be grown by farmers. Combined genomic and phenomic data to unravel the genotype-to-phenotype problem. Using existing genomic data, this project combined the collected HTP data with genomic data to identify locations on the genome that affect measured traits. As further validation that single plants can be accurately phenotyped for trait expression under farmer growing conditions, the reduced height genes were the most significant marker trait associations for both single plants and full plots. Spike length and number of spikelets per plant also had overlapping genetic regions for measurements on both single plants and full plots. This suggest that measurements taken on material that is segregating or in early generations can be selected on for trait expression in fixed lines. Created and distributed open-source user tutorials. Data has been documented and provided in Dryad digital repository (Other Products). This work has laid a foundation for fundamentally altering the role of phenotyping in early, segregating generations. Rather than routinely practicing negative selection, i.e. removal of inferior plants such as those that are susceptible to disease, positive selection for traits such as grain yield could be practiced. This should allow for superior plants to be selected and advanced at all levels of the breeding cycle resulting in higher performing cultivars that are released to farmers.

Publications

Type: Journal Articles Status: Other Year Published: 2022 Citation: Crain, J., Wang, X., Evers, B., & Poland, J. Field-based single plant phenotyping for plant breeding. The Plant Phenome Journal.

Progress 12/01/19 to 11/30/20

Outputs
Target Audience:The target audience is plant breeders in the national and international community, with current results being presented in lab meetings. Changes/Problems:Over the course of the grant, we have completed two site years of data collection using a replicated, randomized complete block design. While our experimental design has been the same, we did increase plant spacing in 2020. The biggest unforeseen challenge has been the COVID-19 pandemic which has limited manual data collection and the speed at which data can be collected. While we successfully completed the field season and collected all of the same traits in 2019 and 2020, 2020 data was scaled down to not include sub-sampling per plot for time-expensive trait like spikes per square meter. This could directly effect the precision of our 2020 phenotypic estimates. While sample processing speed was faster in 2020, labor restrictions (social distancing, limited number of individuals) have delayed final sample processing. In addition, the pandemic resulted in canceled plans to have undergraduate interns during the summer. While much of the grant has proceeded, this missed opportunity has prevented further exploration and analysis of the data sets. We have also taken a no-cost extension of the project and will extend the summer internships to Summer 2021, pending a suitable situation. What opportunities for training and professional development has the project provided?As a postdoc, PI Crain has also had the opportunity to develop project management, organizational, and reporting skills for maintaining scientific research. Before COVID-19, two undergraduate scholars from underrepresented groups were selected to be summer interns. Unfortunately, as the pandemic progressed, we were unable to move forward with the undergraduate internships for community safety. How have the results been disseminated to communities of interest?Current research activities have been presented in lab meetings. The second year of data will provide sufficient information to submit a peer reviewed manuscript as well as national and international society meetings. What do you plan to do during the next reporting period to accomplish the goals?We have applied for and received a 12-month no cost extension. During that time final analysis of data will occur as well as submitting a manuscript for peer-review. Final activities include: Specific Objectives: 1. Complete manual data recording. During the first year, threshing single plants was time prohibitive. A new threshing procedure using a new threshing machine has accelerated the rate of grain yield collection. While this has greatly increased sample throughput, COVID-19 restrictions have hampered ability to process samples as expediently as possible. 2. Complete analysis of phenotypic and HTP data by: a. Correlating 2020 HTP and yield data for single and full plots. b. Investigate genetic architecture of HTP traits through genome wide association analysis. c. Evaluate if trait performance can be estimated using genomic prediction. 3. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories. a. Document data pipeline for image trait extraction b. Manuscript preparation will include preparing scripts and data sources for open repository data submission. This will allow for other scientist to recreate the analysis and modify scripts for their own projects.

Impacts
What was accomplished under these goals? Progress on specific goals and objectives 2019-2020: 1. Adapt and extend HTP methods to the scale of single plants. a. Develop a single plant panel of inbred lines for initial evaluation. In 2020, we completed a second site year of replicated data collection using a diverse panel of 339 public winter wheat lines. Single plants were started in the greenhouse and transplanted to the field in the fall of 2019. This population represents a range of phenotypes and diversity in inbred lines creating an artificial, segregating population to provide a proof-of-concept panel to phenotype without the challenges of segregation. After analyzing initial data from 2019, the plant spacing was expanded to 70 cm grid during the 2020 growing season to enable better separation of single plants from HTP imaging. In addition to single plants, each variety was planted in a two full replicate plot nursery adjacent the single plant nursery. This allowed for evaluation of single plant phenotype expression compared to trait expression in a dense canopy of the yield plot and accurate measurement of grain yield. b. Collect HTP Data from Single Plants Throughout the 2020 growing season, HTP data was collected 10 times with an unmanned aerial vehicle (UAV) using multi-spectral imaging. In addition, to collecting HTP data, manually measured traits of heading date, number of spikes per plant, spike length, and spikelets per spike were collected. Due to COVID-19, some traits were not measured the exactly the same as 2019 due to labor and time constraints. For example, rather than collect two sub-samples per plot only one sample was obtained for number of spikes per square meter. Final yield data per plant including thousand kernel weight were collected on all single plants and full plot data. 2. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories. a. Collect HTP data from single plants All HTP data collection was completed using a DJI M100 UAV with the MicaSense RedEdge-M multispectral camera, a DJI M600 UAV with the Zenmuse X5R digital camera, and a DJI M600 UAV with the Sony Alpha 7R III digital camera. Existing UAV HTP infrastructure was used to curate multiple in-season measurements were made of single plants. Current laboratory procedures were used to store meta-data including date and time of data collection, image naming, database storage and retrieval. b. Extract data from HTP images of single plants A Python-based image processing pipeline (github.com/xwangksu/bip) and a Python-based digital trait extraction pipeline (github.com/xwangksu/traitExtraction) were used for data analysis. The plant and plot level canopy reflectance trait values of five spectrum bands (red, blue, green, near-infrared, and red-edge) and three vegetation indices (normalized difference vegetation index, normalized difference red-edge, and green normalized difference vegetation index) were extracted from multispectral images. The percent ground coverage was also extracted from RGB images 3. Identify key traits that can be used to predict plant yield, such as growth rate and plant yield. a. Confirm that single plants can be used to predict full plot response. Correlation analysis was used to correlate phenotypic traits collected on single plants to full plots. Several traits had highly significant (p < 0.001) correlation for two years of data including spikelets per spike, plant height, and spike length. HTP traits showed good correlation to grain yield in both the full plots and single plants. For 2019 (data analyzed spring 2020), full plot yield were positively correlated to grain yield. This finding is similar to many other reported literature results; however, when we compared single plant vegetation indices to full plot yield, nearly every measurement time was (p < 0.001) negatively correlated to plot yield. While the negative correlation was unexpected, throughout the growing season the same magnitude of negative correlation was maintained. This supports that spectral reflectance can be used to select superior plants even at a single plant level, but that it will be critical to derive the correlations and prediction models using single plant data and that extrapolation of HTP plots to single plants is limited. 4. Combine genomic and phenomic data to unravel the genotype to phenotype (G2P) challenge. Genomic techniques including QTL mapping, association mapping, and genomic selection will be applied to better predict plant behavior and growth. a. Identify genetic architecture for measured phenotypic traits. While the 2019 data identified similar genomic regions in single plant and full plot for spikelets per spike and spike length, these regions were not identified in a GWAS of 2020 data. This could be due to the quantitative nature of these traits, as well as potential of manual phenotyping changes due to the COVID-19 pandemic. Final analysis will investigate relationship between spectral reflectance and genetic architecture.

Publications

Type: Book Chapters Status: Submitted Year Published: 2021 Citation: Crain, J., Wang, X., Lucas, M. & Poland, J. (Submitted 2020). Experiences of applying high-throughput phenotyping for wheat breeding and improvement. In Advanced Concepts and Strategies in Plant Sciences: High-Throughput Crop Phenotyping. Springer.

Progress 12/01/18 to 11/30/19

Outputs
Target Audience:The target audience is plant breeders in the national and international community. Changes/Problems:Data processing, specifically extracting the images of single plants has been more challenging than anticipated. While the initial trial was planted on 50 cm grids, image data had overlap between plants. When plants overlap, the difficulty to parse pixels to their respective plant increases dramatically. While we feel that this issue can be resolved through better algorithms, we have also adjusted the planting spacing to 70 cm grids for the second year of data collections. Physically processing the samples has taken longer than anticipated. For final yield data threshing single plants and obtaining thousand kernel weight is still ongoing for the first year of data collection. For the second year of data, larger threshing machines may be used to speed data collection. This has also slowed pipeline analysis of preliminary results and only hand measured phenotypic traits have been reported. Yield data is essential for mapping HTP data to plant yield. Open source pipelines have also not been developed as expediently as planned. Initial work to extract data and curate samples has not been uploaded to open source repositories. The first year of data collection has provided opportunity to develop a pipeline which can be deployed making the second year of data acquisition and analysis more efficient as well as open source in real-time. What opportunities for training and professional development has the project provided?Professional development and training has consisted of attending research conferences and presenting research results through oral presentations. As a postdoc, PI Crain has also had the opportunity to develop project management, organizational, and reporting skills for maintaining scientific research. How have the results been disseminated to communities of interest?Current research activities have been presented at laboratory and departmental meetings. The first year of field trials are currently being evaluated with a second year of field trials established which will provide results for peer-reviewed journal articles. Meetings presenting research: Jared Crain Department of Plant Pathology Departmental Seminar April 25, 2019 Jared Crain Laboratory Meeting Poland Lab October 23, 2019 What do you plan to do during the next reporting period to accomplish the goals?The first year of the project has been successful in developing data and knowledge toward accomplishing project goals. Using the first year as a springboard, the following activities will be undertaken to complete the project goals. Specific Objectives: 1. Adapt and extend HTP methods to the scale of single plants. Evaluate second year of field trials for artificial segregating population. Currently, the second year field trials have been planted, and phenotypic trait evaluation will be conducted in the spring of 2020. For consistency, all previously measured traits will be evaluated in the same manner, providing two-site years of data for publications. 2. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories. Document data pipeline for image trait extraction Currently, a highly-customized image processing pipeline is used to extract single plant image data from HTP imagery.Along with extracting plant information, the pipeline geo-references plants, and provides trait output for statistical analysis.Work during the next reporting period will document this pipeline, and customizable for other programs to use. Place image pipeline on open source repositories. The data processing pipeline will be placed on an open source data or code repository such as git which will allow anyone to access the information. This will be crucial in broadening the impact of this research. 3. Identify key traits that can be used to predict plant yield, such as growth rate and plant yield. Develop models that describe plant growth. Mathematical models, including logistic growth curves, will be developed to describe plant growth. These models will allow growth to be decomposed into a few key traits which can be used for QTL or association mapping. Correlate HTP data to plant yield. Spectral data collected with UAVs will be correlated to plant and plot yield. This will allow us to identify which traits can be used for predicting plant yield, and how plant yield can be selected in early generations. 4. Combine genomic and phenomic data to unravel the genotype to phenotype (G2P) challenge. Genomic techniques including QTL mapping, association mapping, and genomic selection will be applied to better predict plant behavior and growth. Complete GWAS analysis of yield traits. Current, results for phenotypic traits suggest that single plants can be used to predict full plot phenotypes.These results have not looked at grain yield or HTP data as final yield data is still being processed. Create GS models to predict traits. Develop GS models and evaluate prediction accuracy of single plant traits within the artificial segregating population. 5. Create and distribute open source user tutorials to enhance scientists' knowledge and enable high-throughput phenotyping in their respective programs. Document and create analysis tutorials and examples. Curate scripts into a more user-friendly format. Place tutorials and data sets in public repositories. Once documents are created, place them on a open source repository such as git or dryad. This will allow other to recreate the analysis as well as build their own analysis. Publish results with full data analysis. Results from the experiment will be published in a peer-reviewed journal along with full data and code as supplementary files, providing open access to the data and methods.

Impacts
What was accomplished under these goals? This project seeks to increase the rate of genetic gain made in plant breeding programs by giving breeders quantitative tools to make early-generation selections. By making quantitative selections, rather than qualitative observations such as diseases resistant or susceptible, we aim to provide plant breeders with the tools to advance higher quality germplasm as plants move to later stages and preliminary yield testing within the breeding program. Through this project, we aim to transfer HTP technology that has been developed for measurement of full plots which can only be applied at later stages in the breeding pipeline (yield and preliminary yield trial stages that occur five-seven years after crossing) into early plant breeding stages. To make the methodology easy to implement, user-friendly software pipelines will be developed and released to the broader research community. Progress on specific goals and objects: 1. Adapt and extend HTP methods to the scale of single plants. Test a single plant panel of inbred lines for initial evaluation. Using 339 public lines representing a diverse collection of elite winter wheat germplasm, we developed a four replicate randomized complete block design of single plants along with a concurrent two replicate trial of the same lines in full yield plot. Single plants were started in the greenhouse and transplanted to the field on 50 cm grids in the fall 2018. In 2019, the same population was placed in the field following similar procedures with the exception of using a 70 cm grid. This population of single plants with paired full plot provides a proof-of-concept panel to phenotype without the challenges of segregation. This design provides an opportunity to verify single plant data compared to full plot data. Collect HTP Data from Single Plants Throughout the growing season, HTP data was collected at five time points with an unmanned aerial vehicle (UAV) using multi-spectral imaging (detailed below). In addition, to collecting HTP data, manual traits of heading date, number of spikes per plant, spike length, and spikelets per spike were collected. Final yield data per plant including thousand kernel weight were collected on all single plants with hand harvesting and full plots with combine harvest. 2. Develop open source pipelines that extract biological relevant data from images, and store and curate HTP data in well documented open source repositories. Collect HTP data from single plants We have tested and deployed three unmanned aerial systems for multispectral image and high resolution RGB image acquisition, including the DJI M100 unmanned aerial vehicle (UAV) with the MicaSense RedEdge-M multispectral camera, the DJI M600 UAV with the Zenmuse X5R digital camera, and the DJI M600 UAV with the Sony Alpha 7R III digital camera. Current laboratory procedures were used to store meta-data including date and time of data collection, image naming, database storage and retrieval. Extract data from HTP images of single plants A Python-based image processing pipeline (github.com/xwangksu/bip) and a Python-based digital trait extraction pipeline (github.com/xwangksu/traitExtraction) were used for data analysis. The plant-level canopy reflectance trait values of five spectrum bands (red, blue, green, near-infrared, and red-edge) and three vegetation indices (normalized difference vegetation index, normalized difference red-edge, and green normalized difference vegetation index) were extracted from multispectral images. The percent ground coverage was also extracted from RGB images 3. Identify key traits that can be used to predict plant yield, such as growth rate and plant yield. Confirm that single plants can be used to predict full plot response. Using collected phenotypic data, the correlation between entries grown as single plants and full plots was 0.53 for spike length and 0.44 for spikelets per spike. There was no relationship between the number of tillers per plant on single plants and spikes per area in full plots. These results indicate that some traits, particularly yield component traits, can be measured on single plants and provide a reliable indication for full plot growth. 4. Combine genomic and phenomic data to unravel the genotype to phenotype (G2P) problem. Genomic techniques including QTL mapping, association mapping, and genomic selection will be applied to better predict plant behavior and growth. Identify genetic architecture for measured phenotypic traits. Using data collected on the single plants and the full yield plots, genome wide association mapping was completed to identify genomic regions controlling trait expression. For two traits, spikelets per spike and spike length the same regions of the genome were identified in both single plants and full yield plots. This is a significant finding providing evidence that component traits can be measured in single plants that will be indicative of full plot yield response. Specifically, if an early-generation plant is expressing desired phenotypic traits, it is likely that later generations in full yield plots will continue to express the trait. This is a key result driving future development of these methods.

Publications