Progress 09/15/24 to 09/14/25
Outputs Target Audience:The target audience includes agricultural scientists, geneticists, and biostatisticians seeking advanced tools for identifying and analyzing important genes in crops and livestock. It also appeals to livestock breeders, crop scientists, biotechnology companies, academic researchers, and policymakers focused on enhancing agricultural productivity and sustainability. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Seven graduate students in statistics received training in developing advanced statistical methods, implementing software, and conducting interdisciplinary research at the interface of statistics, genetics, and breeding. Additionally, threeplant science and two animal science graduate students received training in data analysis and interdisciplinary collaboration with researchers from other fields. How have the results been disseminated to communities of interest?Findings have been disseminated via conference presentations, publications in peer-reviewed journals, and contributions to book chapters. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact statement: This project addresses the pressing need to improve agricultural productivity and resilience amid climate and resource challenges by advancing statistical tools for genetic improvement. Building on last year's progress, we developed hierarchical Bayesian models that jointly perform GWAS and TWAS, enabling more accurate identification of causal genes and regulatory pathways. Implemented in open-source software, these integrative methods enhance the precision and interpretability of genetic analyses and are readily applicable across crops and livestock. Applying this framework to flowering time in sorghum--a key adaptive trait influencing yield stability--demonstrated how combining GWAS and TWAS can reveal biologically meaningful pathways, such as ageing-related MADS-box and SBP transcription factors, that are directly relevant to breeding for resilience. Beyond research outcomes, the project is fostering workforce development by training graduate students at the intersection of statistics, genetics, and breeding. Through hands-on collaboration with biologists and breeders, trainees are gaining experience in applying advanced statistical tools to real-world agricultural problems. Together, these efforts contribute to U.S. agricultural sustainability by accelerating genetic discovery, enhancing data-driven breeding, and preparing a new generation of interdisciplinary scientists. Objective #1 We developed new hierarchical Bayesian models for jointly performing GWAS and TWAS. The proposed framework effectively incorporates potential group structures among markers and accounts for nonlinear effects of gene expression. Theoretical guarantees for the method have been established. The method has been implemented in the existing R package bravo. In parallel, our statistics group developed specialized software for GWAS applications in crop and livestock species. Collaborating biologists are currently integrating this software into their genetic data preprocessing pipelines. The project has also contributed to workforce development by training four graduate students in statistics, focusing on the interdisciplinary integration of genetics, breeding, and statistical methodology. UNL team applied an integrated GWAS-TWAS approach to analyze flowering time in sorghum. GWAS alone identified several genomic regions, such as SbFT8 and a locus near miR172, though many associations lacked strong statistical confidence. In contrast, TWAS pinpointed candidate genes whose expression levels were significantly correlated with flowering time, including MADS-box genes, SBP transcription factors targeted by miR156, and FT-like paralogs. Both methods converged on the ageing pathway, highlighting the central role of small RNAs and their downstream transcription factors in regulating flowering time variation. These findings illustrate how GWAS excels at detecting regulatory variants at the genomic level, while TWAS captures downstream expression effects. Together, the two approaches provide complementary insights, increasing confidence in the identified candidate pathways. Overall, these results demonstrate the practical impact of developing joint GWAS-TWAS methods: integrative models not only enhance gene discovery but also yield biologically meaningful targets for breeding. This directly advances Objective #1 and lays the groundwork for future work under Objectives #2 and #3, which will extend these methods to ordinal and functional trait data. Objective #2 Despite some initial challenges, we have made strong progress in developing a multi-locus GWAS method for ordinal traits. Our next step is to apply the method to real datasets provided by our biology collaborators at ISU and UNL.
Publications
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Run Wang, Somak Dutta, Vivekananda Roy. (2025) Bayesian Iterative Screening in Ultra-high Dimensional Linear Regressions. Bayesian Analysis, Advance Publication 1-26 2025. https://doi.org/10.1214/25-BA1517
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Rao, Y. and Roy, V. (2025). Necessary and sufficient conditions for posterior propriety for generalized linear mixed models. Sankhya, Series A, 87, 157-190
- Type:
Book Chapters
Status:
Published
Year Published:
2024
Citation:
Roy, V., Khare, K. and Hobert, J. P. (2024). The Data Augmentation Algorithm. In Handbook of Markov Chain Monte Carlo, Second Edition (eds. Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng), Chapman & Hall/CRC.
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Davis, J. M., Coffey, L. M., Turkus, J., L�pez-Corona, L., Linders, K., Ullagaddi, C., Santra, D. K., Schnable, P. S., & Schnable, J. C. (2025). Assessing the impact of yield plasticity on hybrid performance in maize. Physiologia Plantarum. Advance online publication. https://doi.org/10.1111/ppl.70278
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2025
Citation:
Global-local MCMC using Riemannian geometry. Vivekananda Roy. Fast and the Curious 2, Toronto, Canada, September 2025.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2025
Citation:
A geometric approach to informed MCMC sampling. Vivekananda Roy. Joint Statistical Meetings, Nashville, USA, August 2025
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Informed MCMC for Bayesian variable selection. Vivekananda Roy. CFE-CMStatistics, London, UK, December 2024.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Predicting the Unpredictable: Introduction to Monte Carlo Simulations, Vivekananda Roy. Ahmedabad University, India, August 2024.
|
Progress 09/15/23 to 09/14/24
Outputs Target Audience:The target audience includes agricultural scientists, geneticists, and biostatisticians seeking advanced tools for identifying and analyzing important genes in crops and livestock. It also appeals to livestock breeders, crop scientists, biotechnology companies, academic researchers, and policymakers focused on enhancing agricultural productivity and sustainability. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Two statistical graduate students received training in developing and optimizing statistical methods. Additionally, one plant science and two animal science graduate students received training in data analysis and interdisciplinary collaboration with researchers from other fields. How have the results been disseminated to communities of interest?The newly developed method was presented in an online workshop on May 17, 2024, with hands-on training for combined GWAS and TWAS using SVEN from the R package bravo. The workshop attracted 317 participants from 52 countries, received positive feedback, and has since garnered 207 YouTube views and four downloads of related materials. In addition, we are preparing three manuscripts for submission to scientific journals to disseminate the results to communities of interest. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact statement: Agriculture in the 21st century faces significant challenges: it must produce more food, feed, and fiber for a growing population withdiverse dietary preferences, all while dealing with limited farmland,a shrinking rural workforce, and increased demand for bio-energyresources. Additionally, it must adopt sustainable methods and addressthe rising impacts of climate change. Although global cropproductivity has improved over the past 60 years, continuedadvancements are necessary to meet these demands. This requiresinvesting in genetic improvements for crops and livestock, studyingkey agricultural species in real-world conditions, and identifyinggenes critical to U.S. agriculture. It is also essential to sharethese findings with breeders and the agricultural community, whilefostering the development of skilled professionals through educationand outreach initiatives. GWAS (Genome-Wide Association Studies), TWAS (Transcriptome-WideAssociation Studies), and eQTL (expression Quantitative Trait Locus)mapping are powerful tools for identifying the genetic basis ofcomplex traits, validating genes, and guiding genetic improvements incrops and livestock. However, existing methods typically perform GWASand TWAS separately, combining results afterward through statisticaltests, which can limit their ability to detect causalgenes. Additionally, many traits are measured in non-Gaussian formats,such as ordered categorical scores (e.g., crop disease ratings), timeseries (e.g., growth data), or functional curves (e.g., photosyntheticresponses). Current models often overlook nonlinear relationshipsbetween gene expression and traits, reducing their predictivepower. Therefore, innovative generalized or nonlinear models arenecessary to enhance these studies. To address these gaps, we have developed new Bayesian modelsintegrating GWAS and TWAS in a single hierarchical framework,incorporating effect size shrinkage and model penalties to manageconfounding factors. Separate models are being designed for differenttypes of response variables, such as ordinal data. Furthermore, theproject has started training three graduate students in theinterdisciplinary fields of genetics, breeding, and statistics. Thesestudents are helping disseminate the methods to the broader researchcommunity by assisting with data analyses and hosting hybridworkshops. Additionally, software is being developed to promote broadapplication and advance U.S. agricultural goals through research andcapacity-building efforts. For Objective #1, we have now successfully extended the SVEN methodology for jointly performing GWAS and TWAS through a single Bayesian hierarchical model. The method was presented in an online workshop on May 17, 2024. During the workshop, hands-on training was given for combined GWAS and TWAS using SVEN from the R package bravo. There were 317 registered participants in this workshop representing 52 countries. We received several positive comments from the participants of the workshop. Also, there were 207 views of the recording since it was posted on YouTube (May 17, 2024) and 4 people downloaded the workshop-related materials. Currently, methods are being developed for incorporating possible group structures among the markers and nonlinear effects of the gene expression for the combined GWAS and TWAS. The corresponding implementation in the bravo package is also in progress. The team at University of Nebraska-Lincoln (UNL) generated, curated, and transferred two large datasets consisting of matched genotype, transcript and phenotype datasets for flowering time in large maize and sorghum diversity panels to the statistics team at Iowa State University (ISU). By mining the literature, we generated a set of high confidence flowering time genes to use asground truth to evaluate model performance. For Objective #2, we have almost completed the development of a multi-locus GWAS method for ordinal traits. While we have yet to analyze real datasets from the different biology teams of ISU, we are testing our methodology on simulated data sets. Next, we will extend the methodology for combining GWAS, TWAS, and eQTL mapping of ordinal traits.
Publications
- Type:
Other
Status:
Submitted
Year Published:
2024
Citation:
Roy, V. (2024) A geometric approach to informed MCMC sampling, https://arxiv.org/abs/2406.09010
- Type:
Other
Status:
Submitted
Year Published:
2024
Citation:
Rao, Y. and Roy, V. (2024) Necessary and sufficient conditions for
posterior propriety for generalized linear mixed models,
https://arxiv.org/abs/2302.00665
- Type:
Book Chapters
Status:
Submitted
Year Published:
2024
Citation:
Roy, V., Khare, K., and Hobert, J. P. (2024) The data augmentation
algorithm, https://arxiv.org/abs/2406.10464, Handbook of Markov chain
Monte Carlo, 2nd Edition, Steve Brooks, Andrew Gelman, Galin L. Jones
and Xiao-Li Meng eds., Chapman & Hall/CRC, to appear
- Type:
Other
Status:
Other
Year Published:
2024
Citation:
Escamilla, D.M., D. Li, K.L. Negus, K.L. Kappelmann, A. Kusmec, A.E. Vanous, P.S. Schnable, X. Li, and J. Yu*. Genomic selection: essence, applications, and prospects. Plant Genome. - in preparation
|