Progress 06/01/23 to 05/31/24
Outputs Target Audience:The target audience is plant breeding entities in both academia and industry, students with interests in contributing to plant breeding, engineers developing tools for measuring plants in new ways for breeding, and ultimately farmers and consumers who will use the products produced by breeding programs. We have presented our work at international and local conferences including representatives and students from both public and private breeding groups and undergraduate, graduate, and postdoctoral students at UC Davis and other California colleges. We have published our work in peer-reviewed publications in genetics-related journals and continued to maintain and update our open-source R packages based on feedback from users. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?This project has contributed to the training of one graduate studentand a postdoctoral scholar in the development and application of quantitative genetics tools. How have the results been disseminated to communities of interest?This period we published three papers based on our work, presented the results at the GxExM symposium, the PAG conference, a AG2PI field day, the Zeaevolution seminar series, a Corteva New Frontiers conference, and to one industrial group. What do you plan to do during the next reporting period to accomplish the goals?Over the next period, during the no-cost extension, I will attend the 2024 AFRI PD meeting and present a poster on the work, as well as the 2024 NAPD annual conference. We will also publish our work on extending MegaLMM to use environmental covariates for genetic value prediction in multi-environment trials.
Impacts What was accomplished under these goals?
In this year, we made progress in several directions. 1) We applied MegaLMM to multiple new breeding contexts. First, we collaborated with a cattle breeding group to study the use of MegaLMM to improve genetic value prediction of milk quality traits. Milk quality traits are complicated to study because they are highly dynamic during a cow's life, and thus must be studied longitudinally. Near Infrared Spectroscopy is a high throughput phenotyping technique that can be used to indirectly assay quality-related characteristics rapidly on milk samples. But this is a high-dimensional data source so directly applying linear mixed models for genetic value estimation has been difficult using traditional approaches, thus requiring breeders to use approximate methods that don't fully leverage all information in the data. We applied MegaLMM to a dataset from a cattle breeding program, showing improved genetic value predictions with lonngitudinal data. The results were published in the Journal of Dairy Sciences. Second, there is significant interest in plant breeding in using controlled environment systems to carefully measure physiological traits using phenomics technologies and then leveraging these data to improve breeding for stress conditions in the field. We collaborated with an European consortium who used the PhenoArch platform to measure a suite of maize physiological traits under controlled conditions and in parallel ran a large set of field trials. We applied MegaLMM to integrate the chamber and field data to ask if chamber data could improve genetic value estimation and prediction in the field. The results were promising, though perhaps not as successful as hoped, suggesting that field trials are generally more valuable than the chamber data in most cases. Nevertheless, MegaLMM provided more comprehensive answers to this hypothesis than were possible before, and we developed several analytical strategies that had improved performance. 2) We continued to develop new methodologies that built off the MegaLMM framework, allowing us to target new challenges in breeding. First, we used the capability of MegaLMM to learn the genetic and non-genetic correlations among multiple traits to design a new Genome-Wide Association Study (GWAS) approach for identifying genetic loci with effects on multiple traits. It has been known for many years that QTL mapping and GWAS have improved power when used to analyze multiple traits together, because correlations among traits can be leveraged to both control experimental noise, and to find common weak patterns that together increase confidence in discoveries. However, studying multiple traits in GWAS has additional challenges in controlling for false positives results than single-trait GWAS. By applying MegaLMM to multi-trait datasets to estimate correlations, and then using these outputs in GWAS models, we developed the JointGWAS R package that is computationally efficient for multi-trait GWAS. We applied this to a set of ~50 traits measured on a large maize panel to identify loci in maize that derived from the wild relative teosintes that contribute to trait variation in modern maize. These results were published in Science. We also worked on an extension of MegaLMM to leverage environmental data to improve gene-environment-interaction analyses from multi-environment trials, focusing on the goal of predicting genetic values in new environments. Our approach leverages high-dimensional environmental covariates to learn the relationships among trials. We tested our method on the maize GenomesToFields data. Results have been submitted to the journal Genetics and are currently under review. 3) Developing training material for users We improved the documentation of MegaLMM by creating thehttps://deruncie.github.io/MegaLMM/ pkgdown reference site, and created a new vignette showing how to use MegaLMM to analyze data from multi-environment trials.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2024
Citation:
Chen, Yansen, Hadi Atashi, Jiayi Qu, Pauline Delhez, Daniel Runcie, H�l�ne Soyeurt, and Nicolas Gengler. "Exploring a Bayesian sparse factor model-based strategy for the genetic analysis of thousands of MIR-spectra traits for animal breeding." Journal of Dairy Science (2024).
- Type:
Journal Articles
Status:
Published
Year Published:
2024
Citation:
Baber Ali, Bertrand Huguenin-Bizot, Maxime Laurent, Fran�ois Chaumont, Laurie C. Maistriaux, St�phane Nicolas, Herv� Duborjal, Claude Welcker, Fran�ois Tardieu, Tristan Mary-Huard, Laurence Moreau, Alain Charcosset, Daniel Runcie & Renaud Rincent. 2024. High-dimensional multi-omics measured in controlled conditions are useful for maize platform and field trait predictions. Theoretical and Applied Genetics, 137(7), p.175.
- Type:
Journal Articles
Status:
Published
Year Published:
2023
Citation:
Yang, Ning, Wang, Yuebin, Liu, Xiangguo, Jin, Minliang, Vallebueno-Estrada, Miguel, Calfee, Erin, Chen, Lu, Dilkes, Brian P., Gui, Songtao, Fan, Xingming, Harper, Thomas K., Kennett, Douglas J., Li, Wenqiang, Lu, Yanli, Ding, Junqiang, Chen, Ziqi, Luo, Jingyun, Mambakkam, Sowmya, Menon, Mitra, Snodgrass, Samantha, Veller, Carl, Wu, Shenshen, Wu, Siying, Zhuo, Lin, Xiao, Yingjie, Yang, Xiaohong, Stitzer, Michelle C., Runcie, Daniel, Yan, Jianbing, Ross-Ibarra, Jeffrey. 2023. Two teosintes made modern maize. Science. 2023 Dec 1;382(6674):eadg8940.
|
Progress 06/01/22 to 05/31/23
Outputs Target Audience:The target audience is plant breeding entities in both academia and industry, students with interests in contributing to plant breeding, engineers developing tools for measuring plants in new ways for breeding, and ultimately farmers and consumers who will use the products produced by breeding programs. We have presented our work at international and local conferences including representatives and students from both public and private breeding groups and undergraduate, graduate, and postdoctoral students at UC Davis and other California colleges. We have published our work in peer-reviewed publications in genetics-related journals and continued to maintain and update our open-source R packages based on feedback from users. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?This project has contributed to the training of two graduate students and a postdoctoral scholar in the development and application of quantitative genetics tools. How have the results been disseminated to communities of interest?We have published 7 papers based on these results over the duration of the project. This period, we presented work from this project at the Population, Evolutionary and Quantitative Genetics conference, and the Maize Genetics conference. We developed and taught a module on the use of the MegaLMM at the 2022 UC DavisModern Programming in Genome to Phenome short course. We developed a pkgdown manual for MegaLMM available here:https://deruncie.github.io/MegaLMM/ What do you plan to do during the next reporting period to accomplish the goals?Over the next reporting period, we will complete and publish our extension to MegaLMM to leverage environmental covariates in multi-environment and gene-environment interaction prediction. This will be accompanied by a new tutorial on its use and better document of the whole package on the GitHub page.
Impacts What was accomplished under these goals?
Objective 1: In the first year of the project we fully developed the multi-trait genomic framework MegaLMM for genomic prediction with up to tens of thousands of traits. In the second year of the project we have focused on extending MegaLMM to additional genetic models. In the second year, we developed an extension to MegaLMM called MegaBayesianAlphabet which permits the suite of Bayesian Alphabet prior distributions for marker effects in genome wide analyses. We focused on BayesC because it is one of the most commonly used methods in the Bayesian Alphabet family and has been shown to often perform as well or better than GBLUP for genomic prediction while at the same time providing feature selection and a list of markers driving the genetics. This latter application makes MegaLMM particularly useful as a tool for gene discovery and GWAS. We published a manuscript on this method in Genetics this year in which we show that MegaBayesC can outperform our previous MegaLMM in a specific genomic prediction case study and also that it can better prioritize genetic variants in GWAS under a range of genetic architectures. Under this objective we also published a study showing the importance of accounting for non-genetic correlations when estimating the genomic prediction accuracy in a wheat breeding program, and two studies investigating the genetic architecture of drought responses in a maize breeding population. This year, in addition to publishing the BayesC paper in Genetics, we used published a paper in the International Journal of Molecular Sciences on using MegaLMM to jointly model haploid and doubled haploid (DH) maize lines in a breeding program. Our main focus this year has been on extending MegaLMM to leverage environmental covariates for predicting gene-environment interactions. The original MegaLMM model was purely empirical, only using observed covariances among traits / environments. However, to make predictions for unseen traits or environments we need to relate these learned covariances to the predictive variables, such as weather or soil variables. We have added this functionality to MegaLMM and have tested it in the Genomes2Fields maize dataset. We can show successful predictions into unseen environments in some contexts. A manuscript documenting this is under development and will be published in the next year. Objective 2: Under this objective, we have extended the MegaLMM R package to accommodate the Bayes Alphabet priors on marker effects. To facilitate this we have significantly re-implemented much of the underlying C++ code to take advantage of the faster floating point arithmetic when possible. We have also developed a new R package called JointGWAS that takes the output of MegaLMM analyses and extracts GWAS associations at every marker on each trait or set of traits genome-wide. We are currently writing a manuscript describing how this approach provides a powerful way to account for correlated traits in GWAS. This year, we have added additional functionality to MegaLMM's R package to accept environmental covariance matrices as priors. This allows users to make predictions into unseen environments. Objective 3: We developed a module on the use of MegaLMM for the 2022 Modern Programming in Genome to Phenome short course at UC Davis. Approximately 25 students attended, a mix of domestic and international graduate students and postdocs. For the module, we developed a new tutorial demonstrating the use of MegaLMM for multi-environment trial analysis. This module is since been published on the GitHub page of MegaLMM. ?
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Hu, H., Meng, Y., Liu, W., Chen, Shaojiang, and D. E. Runcie. Multi-Trait Genomic Prediction Improves Accuracy of Selection among Doubled Haploid Lines in Maize. International Journal of Molecular Sciences. (2022), 23(23), 14558
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Qu, J., Runcie, D.E., and H. Cheng, Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics. Volume 223, Issue 3, March 2023, iyac183
|
Progress 06/01/21 to 05/31/22
Outputs Target Audience:The target audience is plant breeding entities in both academia and industry, students with interests in contributing to plant breeding, engineers developing tools for measuring plants in new ways for breeding, and ultimately farmers and consumers who will use the products produced by breeding programs. We have presented our work at international and local conferences including representatives and students from both public and private breeding groups and undergraduate, graduate, and postdoctoral students at UC Davis and other California colleges. We have published our work in per-reviewed publications in genetics-related journals and continued to maintain and update our open source R packages based on feedback from users. Changes/Problems:We were unable to host the "Modern Programming in Genomic Prediction" workshop in 2021 due to the COVID-19 pandemic. We will host the workshop and contribute a module on multi-trait analysis featuring MegaLMM in 2022. We were unable to host a post-doc exchange with CIMMYT because of the pandemic. What opportunities for training and professional development has the project provided?This project has contributed to the training of three graduate students and a postdoctoral scholar in the development and application of quantitative genetics tools. How have the results been disseminated to communities of interest?The first paper demonstrating the MegaLMM method was published in Genome Biology. We also published three other papers this year and have a fourth in review at Genetics. We gave presentations on this work at the Plant and Animal Genomes conference and the NCCC170 working group. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: This objective is largely complete. Objective 2: We will focus on documentation of the MegaLMM and JointGWAS package. A particular focus of the software development will be implementing a better issue tracking system and providing documentation for model diagnostics. Objective 3: We will develop a module for the 2022 workshop on quantitative genetics at UC Davis. This is a one-week workshop that attracts students, scientist, and industrial staff with diverse backgrounds. The material will include two case-studies of MegaLMM, one with wheat and the other with corn.
Impacts What was accomplished under these goals?
Objective 1: In the first year of the project we fully developed the multi-trait genomic framework MegaLMM for genomic prediction with up to tens of thousands of traits. In the second year of the project we have focused on extending MegaLMM to additional genetic models. We have developed an extension to MegaLMM called MegaBayesianAlphabet which permits the suite of Bayesian Alphabet prior distributions for marker effects in genome wide analyses. We focused on BayesC because it is one of the most commonly used methods in the Bayesian Alphabet family and has been shown to often perform as well or better than GBLUP for genomic prediction while at the same time providing feature selection and a list of markers driving the genetics. This latter application makes MegaLMM particularly useful as a tool for gene discovery and GWAS. We have a manuscript describing MegaBayesC in review in Genetics in which we show that MegaBayesC can outperform our previous MegaLMM in a specific genomic prediction case study and also that it can better prioritize genetic variants in GWAS under a range of genetic architectures. As a case study, we applied MegaBayesC to a dataset from Arabidopsis to identify genetic variants associated with flowering time. In this dataset each accession was assayed for both flowering time and transcriptomic variation. We leveraged the transcriptomic data to better identify variants associated with flowering time. We used this as a trial dataset because flowering genetics is well characterized in Arabidopsis and we were able to show that 14/15 of the strongest-associated variants were close to well-known flowering time-regulating genes in Arabidopsis. This contrasts with much lower enrichment of the top hits from standard GWAS in this same dataset. Under this objective we also published a study showing the importance of accounting for non-genetic correlations when estimating the genomic prediction accuracy in a wheat breeding program, and two studies investigating the genetic architecture of drought responses in a maize breeding population. Objective 2: Under this objective, we have extended the MegaLMM R package to accommodate the Bayes Alphabet priors on marker effects. To facilitate this we have significantly re-implemented much of the underlying C++ code to take advantage of the faster floating point arithmetic when possible. We have also developed a new R package called JointGWAS that takes the output of MegaLMM analyses and extracts GWAS associations at every marker on each trait or set of traits genome-wide. We are currently writing a manuscript describing how this approach provides a powerful way to account for correlated traits in GWAS. Objective 3: Due to the pandemic our summer course in Quantitative Genetics was canceled in 2021. We will hold this course in August 2022 (https://shortcourse.qtl.rocks/?) and are working on registration and developing training materials for this course. We will include a module on the use of MegaLMM for genomic prediction in this course.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2021
Citation:
Hu, H., Campbell, M.T., Yeats, T.H., Zheng, X., Runcie, D.E., Covarrubias-Pazaran, G., Broeckling, C., Yao, L., CAffe-Treml, M., Gutie?rrez, L., Smith, K.P., Tanaka, J., Hoekenga, O.A., Sorrells, M.E., Gore, M.A., and Jean-Luc Jannink. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. Theoretical and Applied Genetics volume 134, pages40434054 (2021)
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Hudson, A.I., Odell, S.G., Dubreuil, P., Tixier, M-H., Praud, S., Runcie, D.E., and Jeffrey Ross-Ibarra. Analysis of genotype by environment interactions in a maize mapping population. G3 Genes|Genomes|Genetics, Volume 12, Issue 3, March 2022, jkac013
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Odell, S.G., Hudson, A.I., Dubreuil, P., Tixier, M-H., Praud, S., Ross-Ibarra, J., and D.E. Runcie. Modeling Allelic Diversity of Multi-parent Mapping Populations Affects Detection of Quantitative Trait Loci. G3 Genes|Genomes|Genetics, Volume 12, Issue 3, March 2022, jkac011
|
Progress 06/01/20 to 05/31/21
Outputs Target Audience:The target audience is plant (and animal) breeders and breeding programs in the public and private sectors. We also aim to reach graduate students who aim to increase their familiarity with statistical methodology. Changes/Problems:We will be unable to host the "Modern Programming in Genomic Prediction" workshop in 2021 due to the COVID-19 pandemic. We will host the workshop and contribute a module on multi-trait analysis featuring MegaLMM in 2022. We were unable to recruit a postdoc until towards the end of the first year also because of the pandemic, but a postdoc has started now and will continue through at least the coming reporting period. A graduate student helped with some of the activities this year instead. What opportunities for training and professional development has the project provided?This project has contributed to the training of two graduate students and a postdoctoral scholar in the development and application of quantitative genetics tools. How have the results been disseminated to communities of interest?The first paper demonstrating the MegaLMM method has been accepted at Genome Biology. We have given presentations on the approach and results at 5 conferences and workshops in the field of quantitative genetics or maize genetics, and to a private company working on crop breeding. What do you plan to do during the next reporting period to accomplish the goals?To address the first objective, we will extend MegaLMM to accommodate genetic marker data directly using the BayesC prior, and evaluate whether MegaLMM can function as a tool for multivariate genome-wide association studies. To address the second objective, we will refine the documentation of MegaLMM to make it more complete and create a web-page to help users get started using MegaLMM. A particular focus of the software development will be implementing a better issue tracking system and providing documentation for model diagnostics. To address the third objective, we will refine several case-studies of MegaLMM to use as teaching vignettes. We will start with the wheat and corn datasets described above. These will eventually be used in our workshop on quantitative genetics in breeding that will occur in the third year of this project.
Impacts What was accomplished under these goals?
The central goal of this project is to develop statistical methods and software that enable the efficient and practical use of multi-trait data in breeding programs. Multi-trait data includes multiple quality measures of a single individual (plant, genotype, animal, etc), high-throughput phenotyping data, and measures of genotypes across many environments. In each case the total combined information about the quality of a candidate line in a breeding program from all traits together is greater than the information in any individual trait. However existing statistical methods and software are not capable of jointly analyzing many traits at once. We have demonstrated using two case studies from wheat and corn breeding programs that out methods improve the accuracy of selections and make more efficient usage of all available data, and are feasible to apply to breeding programs using widely available computer systems. Specifically, under out first objective, we have fully developed and published the framework of a multi-trait statistical method called MegaLMM that incorporates genomic data and high-dimensional phenotypic data in a single multi-trait linear mixed model using the technique of Bayesian factor analysis to achieve statistical robustness. We showed that we can model data from a wide range of experimental contexts by including experimental design factors as fixed effects and multiple genetic or environmental terms as random effects. We derived a Markov chain Monte Carlo method to fit the model. We showed that our approach improved genomic prediction accuracy in a wheat breeding program by up to 74% using data from hyperspectral reflectance and in a corn breeding program by up to 20% using data from multi-environment trials. Under our second objective, we developed and published an open source R package also called MegaLMM that is licensed with the MIT license and is available on Github. We developed a number of new computational algorithms to make the model computationally efficient, including the careful storage of intermediate calculations, a grid-based sampling algorithm for certain parameters, and an efficient method for dealing with pattern missing data that is common in many breeding program contexts. We showed that our software could fit linear mixed models with at least 20,000 traits and 650 observations in less than a day, while other programs could only fit dozens to a few hundreds of traits using simpler, less complete models. We have not started on the third objective in this reporting period.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2021
Citation:
Daniel E Runcie, Jiayi Qu, Hao Cheng, Lorin Crawford (2021) MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits. Genome Biology. Accepted Article.
- Type:
Journal Articles
Status:
Submitted
Year Published:
2021
Citation:
Abelardo Montesinos-L�pez, Daniel Runcie, Maria Itria Ibba, Paulino P�rez-Rodr�guez,
Osval A. Montesinos-L�pez, Leonardo A. Crespo, Alison Bentley, and Jos� Crossa. Measurements for multi-trait genomic-enabled prediction accuracy in multi-years breeding trials. Submitted to G3.
|
|