Source: UNIVERSITY OF ILLINOIS submitted to NRP
PARTNERSHIP: PHENOMIC ASSISTED GENOMIC SELECTION TO ACCELERATE GRAIN YIELD IMPROVEMENT IN SMALL GRAINS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030029
Grant No.
2023-67013-39612
Cumulative Award Amt.
$799,999.00
Proposal No.
2022-10237
Multistate No.
(N/A)
Project Start Date
May 15, 2023
Project End Date
May 14, 2027
Grant Year
2023
Program Code
[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production
Recipient Organization
UNIVERSITY OF ILLINOIS
2001 S. Lincoln Ave.
URBANA,IL 61801
Performing Department
(N/A)
Non Technical Summary
This project aims to develop a selection method that synergistically combines High-Throughput Phenotyping (HTP) and Genomic Selection (GS) to accelerate yield improvement in public small grains breeding programs with modest budgets.Our proposed method, 'Phenomic Assisted Genomic Selection' (PAGS), uses HTP to impute yield phenotypic data on a proportion of research plots. Both imputed and true yield data are subsequently used for GS model training. PAGS enables breeders to generate large GS model training dataset sets required for successful GS among untested breeding candidates, unlocking the potential of GS to shorten breeding cycles and accelerate rates of genetic gain.To develop PAGS, we will generate and analyze yield and HTP data to test the limits of grain yield imputation using HTP data and identify the best statistical or machine learning model for this purpose. Next, we will conduct validation studies to evaluate the effect of including imputed yield data on GS accuracy under different scenarios. Lastly, we will use stochastic simulations to evaluate the costs and benefits of PAGS to ultimately evaluate its merit compared to alternative strategies.
Animal Health Component
50%
Research Effort Categories
Basic
50%
Applied
50%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011599108050%
2011549108050%
Goals / Objectives
Our main goal is to evaluate a new method for improving genomic selection (GS) accuracy for grain yield among untested genotypes using high-throughput phenotypic (HTP) data. With this method, which we refer to as Phenomic Assisted Genomic Selection (PAGS), HTP data is used to impute yield on research plots and imputed yield data along with real yield data for GS model training. The intended outcome of this method is to reduce the cost of GS model training and unlock the potential of GS to accelerate rates of genetic gain in resource-limited breeding programs working on wheat and other small grains. To achieve this overall goal, we will complete the following objectives:1) Evaluate grain yield phenotype imputation methods for yield using high-throughput phenotypic (HTP) data, 2) Determine under what conditions imputed grain yield values will increase GS accuracy of untested genotypes, and 3) Evaluate the costs and benefits of using imputed phenotypes in GS models for grain yield to determine when the approach would be advantageous in a breeding program.Another goal of this project is to help train the plant breeding community on HTP and GSby hosting an annual workshopto enable the community to take advantage of these technologies and the methods that we develop.
Project Methods
We will develop and test PAGS, a strategy that can use HTP to increase the size of the GS model training set and improve GS model accuracy among untested genotypes. For PAGS, we separate the model training set into two subsets: 1) The set of genotypes with both grain yield and HTP data which we refer to as the 'Core Training Set', and 2) The set of genotypes with only HTP data which we refer to as the 'Accessory Training Set'. On the Accessory Training Set, yield phenotypes within an environment will be imputed using the data on the core training set collected from the same environment as the reference data. We expect that the Accessory Training Set will improve prediction accuracy by increasing the size of the overall training set. The Core Training Set is needed to impute yield phenotypes for the Accessory Training Set and to estimate the genetic and residual covariances between actual and imputed yield data so that both types of data are weighted appropriately in the genomic prediction process.To evaluate PAGS, we will collect HTP data and yield on research plots at the University of Illinois and at Utah State University. For the accessory training set, The University of Illinois will use large plot sizes, whereas Utah State University will use small plot sizes. We will begin by evaluating imputation methods that use HTP data to impute yield to determine the best method. Next, we will evaluate the impact of imputed yield data on the accessory training set on GS accuracy among untested genotypes. Finally, we conduct a cost-benefit analysis to determine if PAGS can increase genetic gain per unit time with current estimates of imputation and PAGS accuracies.

Progress 05/15/24 to 05/14/25

Outputs
Target Audience: During this reporting period, the target audience was other researchers and plant breeders. Included among this group are graduate students, post-doctoral researchers, professors, and other professionals carrying out or leading research. This group was targeted because the research that we are doing for this project aims to develop a new breeding strategy that will benefit other breeding and research programs. Changes/Problems:No major changes to the project are noted, however the PhD under the supervision of lead PI Dr. Rutkoski received a fellowship and therefore expenditures on the grant have been delayed compared to what was originally planned. In addition, co-PI Dr. Krause's PhD student turned out to be an unsuitable candidate for this project and instead Dr. Krause will assign the reponsibilities of this project to a post-doc, thus delayingproject expenditures. What opportunities for training and professional development has the project provided?As mentionedin the section on accompishments, we trained 47 people on collecting and analyzing HTP data. Most of the participants were at the early stages of their career. This project is also training one PhD student and one post-doc in depth. The PhD student and post-doc are learning how to use artificial intelligence, and statistical genetics to predict breeding values. How have the results been disseminated to communities of interest?Dr. Rutkoski, the lead PI of this project, presented the preliminary results of this project on January 12, 2025 at the Plant and Animal Genome (PAG) meeting in San Diego California. The PhD student under the supervision of Dr. Rutkoski presented a poster with her research findings at the National Associationof Plant Breeders meeting in July 2025 and at the PAG meeting in January 2025. What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period we will continue working on our analyses to evaluate howusingimputed yield data in genomic selection models impacts prediction accuracies among tested and untested breeding candidates. Also during the next reporting period, we will have completed one or two publications communicating our research findings. In regards to training, we will make our workhop materials availaibe online for public access and plan the next in-person workshop.

Impacts
What was accomplished under these goals? We evaluatedmore than ten different phenotype imputation methods for yield using high-throughput phenotypic (HTP) data and identified the best methods for further testing. Linear models were found to outperform non-parametric methods. We also examined imputation accuracies across a range of levels of missing data and determined that accurate imputations can be obtained even when 90% of the acutal yield data is missing. In addition, we developed methods for combining HTP data across enviornments and found that using data from multiple environments can improve the within environment imputation accuracy. Next, we begain evaluating multi-trait Genomic Selection models for predicting yield. We identified that a factor-analytic model that treats trait-environment combinations as seperate traits to be the best model to use for our purposes, and we are currently in the process of incorporating the imputed yield data into the model. In regards to training, we hosted a three-day workshop with 47 in-person attendees. At the workshop we taught the participants how to collect and analyze HTP data and we provided them wtih code that theycan use to impute phenotypic data and conduct multi-trait genomic selection using HTP data.

Publications


    Progress 05/15/23 to 05/14/24

    Outputs
    Target Audience:During this reporting period, the target audience was other researchers and plant breeders. This group was targeted because the research that we are doing for this project aims to develop a new breeding strategy that will benefit other breeding and research programs. Changes/Problems:During the project, the co-PDDr. Krause moved from Utah State University to Oregon State University. Project activities originally planned to occurat Utah State University will now occurat Oregon State University. During thisdisruptive transition period, Dr. Krause was not successful in hiringa suitable PhD student or incollecting adequate datafor this project, therefore Dr. Krause will complete the project objectives by hiring a post-doc and using data that has already been collected. Due to these changes, it is anticipated that one additional year may be needed to complete the project. What opportunities for training and professional development has the project provided?A graduate student under the supervision of Dr. Rutkoski received training by working with her one-on-one to learn 1) how toevaluate different grain yield imputation methods and 2) how todesign simulations with the data collected for the project to answer research questions. For professional development, the graduate student under the supervision of Dr. Rutkoski also engaged in individual study and attended two different scientific conferences where she presented posters of her work. How have the results been disseminated to communities of interest?For outreach, Dr. Rutkoski held a field day at the University of Illinois where she and the graduate student working on this project gave an oral presentationabout the project's goals to community members in attendance. What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period, the project directors will host a workshop to train graduate students and other researchers on how to usedrones to conduct high-throughput phenotyping and how to use the data for imputation and genomic selection model training. Also during the next reporting period, we will 1) publish our findings about grain yield imputation using UAVs, 2)conduct analyses todetermine ifimputed yield data can be used to help train more accurate genomic selection models, and 3) collect more conventional and high-throughput phenotypic data on research plots to be used for further analysesfor this project.

    Impacts
    What was accomplished under these goals? Small grain crops contribute to the sustainability of agriculture by helping to diversify crop rotations. Accelerating gains in yield in small grainsis important to ensure that farmers can diversify their crop rotationwithout reducing their income. To accelerate gains in yield in small grains, this project is developing an improved breeding method that uses drones and genomic data to train predictive models that will help plant breeders be more effective at improving yield. The method entails using data from drones to predict yield on research plots. This process is called imputation. After imputation, the imputed yield data are used to help train genomic prediction models used to identify the best individualsfor further breeding. During the project period, we collected and analyzed data on research plots using drones and conventional phenotyping methods during the 2022-2023 growing season. We also evaluated three different machine learning methods for their ability to impute yield data using the data collected from the drone. Of the three methods evaluated, we determined that 'MissForest' was the most accurate method. This outcome matters because it indicates that researchers can successfully impute yield data using data from drones, and the imputed data may be useful for genomic selection. This will impactbreeders and researchers becauseit provides them with a tool they can implement to improve crops more effectively.

    Publications