Progress 09/15/23 to 09/14/24
Outputs Target Audience:The target audience includes agricultural scientists, geneticists, and biostatisticians seeking advanced tools for identifying and analyzing important genes in crops and livestock. It also appeals to livestock breeders, crop scientists, biotechnology companies, academic researchers, and policymakers focused on enhancing agricultural productivity and sustainability. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Two statistical graduate students received training in developing and optimizing statistical methods. Additionally, one plant science and two animal science graduate students received training in data analysis and interdisciplinary collaboration with researchers from other fields. How have the results been disseminated to communities of interest?The newly developed method was presented in an online workshop on May 17, 2024, with hands-on training for combined GWAS and TWAS using SVEN from the R package bravo. The workshop attracted 317 participants from 52 countries, received positive feedback, and has since garnered 207 YouTube views and four downloads of related materials. In addition, we are preparing three manuscripts for submission to scientific journals to disseminate the results to communities of interest. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact statement: Agriculture in the 21st century faces significant challenges: it must produce more food, feed, and fiber for a growing population withdiverse dietary preferences, all while dealing with limited farmland,a shrinking rural workforce, and increased demand for bio-energyresources. Additionally, it must adopt sustainable methods and addressthe rising impacts of climate change. Although global cropproductivity has improved over the past 60 years, continuedadvancements are necessary to meet these demands. This requiresinvesting in genetic improvements for crops and livestock, studyingkey agricultural species in real-world conditions, and identifyinggenes critical to U.S. agriculture. It is also essential to sharethese findings with breeders and the agricultural community, whilefostering the development of skilled professionals through educationand outreach initiatives. GWAS (Genome-Wide Association Studies), TWAS (Transcriptome-WideAssociation Studies), and eQTL (expression Quantitative Trait Locus)mapping are powerful tools for identifying the genetic basis ofcomplex traits, validating genes, and guiding genetic improvements incrops and livestock. However, existing methods typically perform GWASand TWAS separately, combining results afterward through statisticaltests, which can limit their ability to detect causalgenes. Additionally, many traits are measured in non-Gaussian formats,such as ordered categorical scores (e.g., crop disease ratings), timeseries (e.g., growth data), or functional curves (e.g., photosyntheticresponses). Current models often overlook nonlinear relationshipsbetween gene expression and traits, reducing their predictivepower. Therefore, innovative generalized or nonlinear models arenecessary to enhance these studies. To address these gaps, we have developed new Bayesian modelsintegrating GWAS and TWAS in a single hierarchical framework,incorporating effect size shrinkage and model penalties to manageconfounding factors. Separate models are being designed for differenttypes of response variables, such as ordinal data. Furthermore, theproject has started training three graduate students in theinterdisciplinary fields of genetics, breeding, and statistics. Thesestudents are helping disseminate the methods to the broader researchcommunity by assisting with data analyses and hosting hybridworkshops. Additionally, software is being developed to promote broadapplication and advance U.S. agricultural goals through research andcapacity-building efforts. For Objective #1, we have now successfully extended the SVEN methodology for jointly performing GWAS and TWAS through a single Bayesian hierarchical model. The method was presented in an online workshop on May 17, 2024. During the workshop, hands-on training was given for combined GWAS and TWAS using SVEN from the R package bravo. There were 317 registered participants in this workshop representing 52 countries. We received several positive comments from the participants of the workshop. Also, there were 207 views of the recording since it was posted on YouTube (May 17, 2024) and 4 people downloaded the workshop-related materials. Currently, methods are being developed for incorporating possible group structures among the markers and nonlinear effects of the gene expression for the combined GWAS and TWAS. The corresponding implementation in the bravo package is also in progress. The team at University of Nebraska-Lincoln (UNL) generated, curated, and transferred two large datasets consisting of matched genotype, transcript and phenotype datasets for flowering time in large maize and sorghum diversity panels to the statistics team at Iowa State University (ISU). By mining the literature, we generated a set of high confidence flowering time genes to use asground truth to evaluate model performance. For Objective #2, we have almost completed the development of a multi-locus GWAS method for ordinal traits. While we have yet to analyze real datasets from the different biology teams of ISU, we are testing our methodology on simulated data sets. Next, we will extend the methodology for combining GWAS, TWAS, and eQTL mapping of ordinal traits.
Publications
- Type:
Other
Status:
Submitted
Year Published:
2024
Citation:
Roy, V. (2024) A geometric approach to informed MCMC sampling, https://arxiv.org/abs/2406.09010
- Type:
Other
Status:
Submitted
Year Published:
2024
Citation:
Rao, Y. and Roy, V. (2024) Necessary and sufficient conditions for
posterior propriety for generalized linear mixed models,
https://arxiv.org/abs/2302.00665
- Type:
Book Chapters
Status:
Submitted
Year Published:
2024
Citation:
Roy, V., Khare, K., and Hobert, J. P. (2024) The data augmentation
algorithm, https://arxiv.org/abs/2406.10464, Handbook of Markov chain
Monte Carlo, 2nd Edition, Steve Brooks, Andrew Gelman, Galin L. Jones
and Xiao-Li Meng eds., Chapman & Hall/CRC, to appear
- Type:
Other
Status:
Other
Year Published:
2024
Citation:
Escamilla, D.M., D. Li, K.L. Negus, K.L. Kappelmann, A. Kusmec, A.E. Vanous, P.S. Schnable, X. Li, and J. Yu*. Genomic selection: essence, applications, and prospects. Plant Genome. - in preparation
|