Source: UNIV OF WISCONSIN submitted to
PARTNERSHIP: OPTIMIZATION OF MATE ALLOCATION WITH GENOMIC IDENTITY-BY-DESCENT PROBABILITIES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
NEW
Funding Source
Reporting Frequency
Annual
Accession No.
1032360
Grant No.
2024-67013-42585
Project No.
WIS06012
Proposal No.
2023-11030
Multistate No.
(N/A)
Program Code
A1141
Project Start Date
Jul 1, 2024
Project End Date
Jun 30, 2027
Grant Year
2024
Project Director
Endelman, J.
Recipient Organization
UNIV OF WISCONSIN
21 N PARK ST STE 6401
MADISON,WI 53715-1218
Performing Department
(N/A)
Non Technical Summary
The development of new plant varieties through breeding is a powerful approach to increasing both the quantity and quality of food. Plant breeders are increasingly adopting genomic selection methods, which use DNA sequence and historical trait data to make statistical predictions. Because of its rapid breeding cycle, one of the potential pitfalls with genomic selection is an accelerated loss of genetic diversity. For specialty and forage crops, this loss of diversity can be controlled by limiting the inbreeding rate. Experience from decades of animal breeding and plant breeding simulations indicates 1% inbreeding is optimal.This project will develop new methods and software to overcome the main technical obstacle to realizing this selection methodology in our target crops of potato and blueberry. Because these crops contain four sets of chromosomes, instead of two sets like animals and many plants, the calculation of genomic inbreeding from genetic marker data is more complicated. Computer simulations will be used to validate our software and predict its impact on genetic improvement and diversity after 20 years. The new tools will be adopted in our breeding programs at the Universities of Wisconsin and Florida, and we expect other plant breeding programs will follow our lead.
Animal Health Component
0%
Research Effort Categories
Basic
30%
Applied
40%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20124101081100%
Goals / Objectives
The overarching goal of this project is to enable sustainable use of genomic selection for long-term genetic gain in outbred crops, particularly polyploids. There is a general awareness among plant breeders that selection for short-term gain can deplete genetic variance too quickly, but a scientific approach to the problem is lacking. Our framework, which is based on established concepts from animal breeding, involves the design of mating plans that maximize a multi-trait index at a specified inbreeding rate. The main impediment to realizing this goal in potato, blueberry, alfalfa, and other autotetraploid crops is that no software exists to compute genomic kinship in pedigreed populations by linkage analysis, i.e., based on identity-by-descent (IBD). Our objectives are therefore to:1.Develop methodology and software to compute genomic kinship in outbred diploid and tetraploid species.2.Validate the use of genomic kinship for optimum mate allocation via stochastic simulation.3.Implement the new selection methodology for potato and blueberry breeding at the Universities of Florida and Wisconsin.
Project Methods
1.Develop methodology and software to compute genomic kinship in outbred diploid and tetraploid species.Calculation of genomic kinship requires reconstructing individuals in a pedigreed population in terms of the founder haplotypes. The first step is to generate phased genotypes for every parent using Hidden Markov Models (HMM). For both diploids and tetraploids, each parent has one hidden variable per marker, representing the phased genotype. For example, if the allele dosage in a tetraploid parent is 1 at a particular locus, there are four possible phased genotypes: 0 0 0 1, 0 0 1 0, 0 1 0 0, 1 0 0 0. In tetraploids, every gamete requires an additional hidden variable to specify the valent configuration that occurred during meiosis; there are three possible bivalent pairings and one quadrivalent. The maximum likelihood solution is found iteratively, with two steps per iteration. In step one, independently for each offspring, the valent variables are selected by maximum likelihood (ML), conditional on the phased genotypes. In step two, sequentially for each parent, the ML solution for phased genotype is computed, conditional on the valent variables. The presence of "intermediators"-individuals that are both parents and progeny-introduces additional complexity compared to the original version of PolyOrigin. After phasing, both the Viterbi and Forward-Backward algorithms will be used to reconstruct all individuals in terms of their parental haplotypes. The Viterbi algorithm provides the most probable solution, conditional on the phased parental genotypes, while the Forward-Backward algorithm provides the marginal posterior probabilities. The inferred parent origins will be linked across generations so that haplotypes can be traced back to the founder population. Genomic kinship coefficients will be computed as described in the publication for R/diaQTL.These new methods will be incorporated into the PolyOrigin software, which is publicly available on GitHub (https://github.com/chaozhi/PolyOrigin.jl). A detailed tutorial will be created to accompany the software.Evaluation: The accuracy of parental phasing and ancestral inference will be evaluated by simulating a multi-generational breeding population with PedigreeSim, including the influence of population size and different mating designs. For diploids, we will compare the performance of PolyOrigin with the software AlphaPeel, which uses a strategy called hybrid peeling.2.Validate the use of genomic kinship for optimum mate allocation (OMA) via stochastic simulation."Mate allocation" is the contribution of each mating to the next generation, i.e., the number of progeny. OMA optimizesmating plans to maximizethe mean genetic merit of the next generation, subject to a constraint on inbreeding rate to ensure long-term genetic gain. Our current algorithm for OMA, called COMA (https://github.com/jendelman/COMA), uses pedigree kinship to control inbreeding. Our hypothesis is that using genomic kinship will lead to improved conservation of genetic variance and higher long-term gain.This hypothesis will be tested using stochastic simulation of a clonal crop breeding programusing the AlphaSimR software. The founder population in mutation-drift equilibrium (coalescent model) will have 10 chromosomes and effective population size of 100. A quantitative trait (which can represent a multi-trait index) with directional dominance will be simulated based on 100 QTL per chromosome and overall genetic variance of 1 in the founder population. The directional dominance parameter will be random normal with mean 0.5 and variance 0.5.Phenotypic selection for 15 years will be simulated as a burn-in period prior to beginning genomic selection with a one-year breeding cycle. Under genomic selection, 500 parental candidates from the seedling nursery (FY0) will be randomly selected by stratified sampling (10 progeny from each of the 50 F1 populations). For computational tractability, the genomic prediction model will be updated each year by adding the most recent cohort and removing the oldest cohort, which maintains a constant training population size of 4 x 750 = 3000 genotypes. Additive and dominance marker effects will be estimated using R/StageWise, and COMA will be used to develop the mating plan, with a target inbreeding rate of 1%.Evaluation:After 20 years of genomic selection, the performance of pedigree vs.genomic kinship will be compared based on genetic gain and efficiency, which is the gain per unit loss of genic standard deviation. Five replicates will be conducted per scenario.3.Implement the new selection methodology for potato and blueberry breeding at the Universities of Florida and Wisconsin.3.1 Evaluation of historical data. The UF blueberry and UW potato breeding programs have used genomic selection to guide the design of mating plans for the past five breeding cycles. These pedigreed populations will be analyzed using the new PolyOrigin software to compute genomic kinship and inbreeding coefficients. For each generation, all prior data will be used as the training population to predict marker effects, and the optimized mating plan from COMA at 1% inbreeding rate will be determined. The COMA output will be compared with the mating plan that was actually used, in terms of inbreeding rate and single cycle genetic gain.3.2 Implement the new selection methodology.Potato: Both UW and UF programs will utilize a multi-trait selection index including total yield, specific gravity, fry color, and maturity. At UF, an additional trait for internal defects due to heat stress will be incorporated. The core of the training population (TP) for genomic selection consists of clones from the preliminary yield trial (FY3), as well as elite clones from other US breeding programs submitted to the National Chip Processing Trial, which is evaluated annually in both Wisconsin and Florida. Until 2023, genotyping of the TP utilized an Infinium SNP array, with 15K reliable markers on the most recent version. In 2023, we switched to using the FlexSeq (RapidGenomics/LGC) targeted GBS. Both programs will select parents from the clonal evaluation trial (FY2) to realize a 3-year breeding cycle. Selection candidates will be genotyped with a 4K targeted GBS platform (DArTag) and imputed up to the TP marker set using randomForest classification trees. The target population size is 768, of which 200-250 will be selected at harvest based on visual selection (for tuber appearance and maturity) and genomic estimated breeding values. Marker effects will be computed with the updated prediction model in early October and used to develop an optimized mating plan with COMA, targeting an inbreeding rate of 1%.Blueberry: The UF blueberry program makes up to 100 crosses annually, with target traits of yield, fruit quality (fruit size, firmness, brix content, flavor), precocity, disease/pest resistance and timing of production. From the 100 crosses, 20,000 progeny are planted in a high-density nursery. The first evaluation cycle, using phenotypic visual selection (culling), is carried out on one-year old seedling (Stage I), where approximately 10% seedlings (2,000) pass to the second stage (Stage II). In the second year, and with more fruit available for evaluation, a new selection (10% of the approximately 2,000 remaining plants) is performed and move to Stage III (200 genotypes). Genomic selection will be used in Stage II. The program has genotyped the 2,000 individuals in Stage II since 2019 with the FlexSeq (Rapid Genomics/LGC) targeted GBS, as well as advanced selections from previous cohorts. For this project, marker effects will be predicted for all genotypes in Stage II, and the mating plan will be developed with COMA at 1% inbreeding rate.