Source: IOWA STATE UNIVERSITY submitted to
NIFA AG2PI COLLABORATIVE: IMPROVING CAUSAL GENE DETECTION ACROSS CROP AND LIVESTOCK SPECIES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1031452
Grant No.
2023-70412-41087
Cumulative Award Amt.
$1,132,877.00
Proposal No.
2023-06073
Multistate No.
(N/A)
Project Start Date
Sep 15, 2023
Project End Date
Sep 14, 2026
Grant Year
2023
Program Code
[AG2PI]- Agricultural Genome to Phenome Initiative
Recipient Organization
IOWA STATE UNIVERSITY
2229 Lincoln Way
AMES,IA 50011
Performing Department
(N/A)
Non Technical Summary
In the face of 21st-century agricultural challenges, our mission is clear: we must produce more food, feed, and fiber for a growing population with evolving dietary preferences, while dealing with limited rural labor and agricultural land, and the need for bio-energy sources. Moreover, climate change introduces more frequent biotic and abiotic stresses. While global crop productivity has matched these challenges, we must intensify our efforts to sustain this progress. This is especially vital as we navigate the rest of the century.To address these pressing needs, our team of experts, spanning crop and livestock breeding, genetics, biochemistry, and data science, is forging ahead. We're developing innovative tools to decode the genetic basis of traits in crops like maize, soybean, sorghum, and in pigs. Our advanced statistical models, enhancing methods like GWAS, TWAS, and eQTL mapping, empower biologists to explore data in groundbreaking ways, uncovering new insights.We are bridging the gap between genetics and traits, from crop yields to Vitamin B levels in maize. Our research probes the interplay of genetics, weather, and environment using diverse data. This newfound knowledge will steer enhancements in crucial U.S. crops and livestock.Our ambitious endeavor extends beyond discovery. It entails crafting novel statistical tools to comprehend essential genes in both livestock and crops, applicable across species. Aligned with the USDA's strategic goals, our work contributes to an equitable, resilient, and prosperous U.S. agricultural system, ensuring accessible, wholesome food for all. Through education and outreach, we'll empower crop and livestock breeders and cultivate the human capital needed to fulfill these aspirations.
Animal Health Component
0%
Research Effort Categories
Basic
100%
Applied
0%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012410106025%
2017310209025%
3043910106025%
3047310209025%
Goals / Objectives
The overarching goal of this project is to develop and support new statistical tools for the breeding of superior individuals or cultivars in genetic populations with the long-term goal of enhancing the production, sustainability, and climate and disease resilience of crop and livestock species. One way to enhance livestock and crop breeding strategies is by better understanding gene-trait associations and prioritizing causal genes of diverse agriculturally important phenotypic traits. Towards this end, the project will bring together researchers from variety of disciplines, including phenomics, genomics, genetic diversity, and data science. Biologists will bring their own biological questions and datasets from different crop and livestock species. Statisticians, with extensive experience in collaborations with biologists, will build statistical models and methodologies to analyze these datasets. The models and analyses will be updated iteratively following feedback from the biologists.Our objectives in this project are: (1) to build powerful multi-locus methods for combined GWAS, TWAS, and expression quantitative trait loci (eQTL) mapping, (2) to develop user friendly open-source R packages and Python libraries with detailed manuals, vignettes, and video tutorials, and (3) interweave research and education through the integration of training and cross-disciplinary research toward producing a skilled STEM agricultural workforce. We plan to achieve these research objectives by pursuing the following three specific objectives:Objective #1: Develop methods to combine GWAS, TWAS, and eQTL mapping of quantitative traits.Our working hypothesis is that a hierarchy of high-dimensional partial-linear and linear models, with appropriate shrinkage on SNP and gene expression effects, will be able to mitigate the confounding effects. In this objective, we will focus on traits for which the responses can be assumed to be univariate or multivariate Gaussian (normally distributed), possibly after a suitable transformation (e.g., log).Objective #2: Develop methods to combine GWAS, TWAS, and eQTL mapping of ordinal traits.The non-Gaussian traits we will focus on are ordinal scores (e.g., disease and root lodging scores). Our working hypothesis is that we will be able to improve the association results by properly accounting for the nature of the non-Gaussianity through an appropriate hierarchical generalized partial-linear multi-locus model. We will also retain the advantages of Objective #1.Objective #3: Develop methods to combine GWAS, TWAS, and eQTL mapping of functional data traits.Here, we will develop multi-locus methods for traits that are measured by a smooth curve (e.g., repeatedly measured phenotypes such as growth rates, time-series, light curves, A/Ci curves). Our working hypothesis is that we will be able to improve the understanding of the genetic basis of variations in the whole trait curves instead of being limited to univariate analyses of summary measurements or independent analysis of individual time points.Alongside the research outcomes, these initiatives will enhance expertise in agricultural genome-to-phenome research through education and outreach activities. Existing ISU courses will be improved by accommodating GWAS and TWAS methods in the syllabus. Outreach programs will provide education and support the research and training a broad range of crop and livestock scientists at multiple types of U.S. institutions. Hybrid workshops will be organized to facilitate training students and scientists.
Project Methods
The statistical methods include the development of hierarchical Bayesian models for combining GWAS, TWAS, and eQTL mapping. Latent indicator variables will be assumed, and model size will be penalized through Bernoulli priors on these latent indicator vectors. Theoretical results will be developed for choosing the right shrinkage to accurately detect associated genes. Fast scalable computational algorithms based on delayed Cholesky factorization, sparse-matrix algebras will be developed and implemented in C++ programming language. These models and methods will be extended to accommodate more general phenotypic responses through link functions.Three methods will be used to assess the validation of association studies: cross-validation with independent datasets from literature, biological pathway analysis, and network analysis with functional enrichment (GO or gene ontology terms) analysis. In addition, simulation experiments will also be conducted based on the literature.Two hybrid workshops will be organized yearly to disseminate research and the software and train the broader scientific community. Feedback will be sought from the workshop participants to assess the overall effectiveness of the workshops and to improve the accessibility of the software, manuals, and vignettes.

Progress 09/15/23 to 09/14/24

Outputs
Target Audience:The target audience includes agricultural scientists, geneticists, and biostatisticians seeking advanced tools for identifying and analyzing important genes in crops and livestock. It also appeals to livestock breeders, crop scientists, biotechnology companies, academic researchers, and policymakers focused on enhancing agricultural productivity and sustainability. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Two statistical graduate students received training in developing and optimizing statistical methods. Additionally, one plant science and two animal science graduate students received training in data analysis and interdisciplinary collaboration with researchers from other fields. How have the results been disseminated to communities of interest?The newly developed method was presented in an online workshop on May 17, 2024, with hands-on training for combined GWAS and TWAS using SVEN from the R package bravo. The workshop attracted 317 participants from 52 countries, received positive feedback, and has since garnered 207 YouTube views and four downloads of related materials. In addition, we are preparing three manuscripts for submission to scientific journals to disseminate the results to communities of interest. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Impact statement: Agriculture in the 21st century faces significant challenges: it must produce more food, feed, and fiber for a growing population withdiverse dietary preferences, all while dealing with limited farmland,a shrinking rural workforce, and increased demand for bio-energyresources. Additionally, it must adopt sustainable methods and addressthe rising impacts of climate change. Although global cropproductivity has improved over the past 60 years, continuedadvancements are necessary to meet these demands. This requiresinvesting in genetic improvements for crops and livestock, studyingkey agricultural species in real-world conditions, and identifyinggenes critical to U.S. agriculture. It is also essential to sharethese findings with breeders and the agricultural community, whilefostering the development of skilled professionals through educationand outreach initiatives. GWAS (Genome-Wide Association Studies), TWAS (Transcriptome-WideAssociation Studies), and eQTL (expression Quantitative Trait Locus)mapping are powerful tools for identifying the genetic basis ofcomplex traits, validating genes, and guiding genetic improvements incrops and livestock. However, existing methods typically perform GWASand TWAS separately, combining results afterward through statisticaltests, which can limit their ability to detect causalgenes. Additionally, many traits are measured in non-Gaussian formats,such as ordered categorical scores (e.g., crop disease ratings), timeseries (e.g., growth data), or functional curves (e.g., photosyntheticresponses). Current models often overlook nonlinear relationshipsbetween gene expression and traits, reducing their predictivepower. Therefore, innovative generalized or nonlinear models arenecessary to enhance these studies. To address these gaps, we have developed new Bayesian modelsintegrating GWAS and TWAS in a single hierarchical framework,incorporating effect size shrinkage and model penalties to manageconfounding factors. Separate models are being designed for differenttypes of response variables, such as ordinal data. Furthermore, theproject has started training three graduate students in theinterdisciplinary fields of genetics, breeding, and statistics. Thesestudents are helping disseminate the methods to the broader researchcommunity by assisting with data analyses and hosting hybridworkshops. Additionally, software is being developed to promote broadapplication and advance U.S. agricultural goals through research andcapacity-building efforts. For Objective #1, we have now successfully extended the SVEN methodology for jointly performing GWAS and TWAS through a single Bayesian hierarchical model. The method was presented in an online workshop on May 17, 2024. During the workshop, hands-on training was given for combined GWAS and TWAS using SVEN from the R package bravo. There were 317 registered participants in this workshop representing 52 countries. We received several positive comments from the participants of the workshop. Also, there were 207 views of the recording since it was posted on YouTube (May 17, 2024) and 4 people downloaded the workshop-related materials. Currently, methods are being developed for incorporating possible group structures among the markers and nonlinear effects of the gene expression for the combined GWAS and TWAS. The corresponding implementation in the bravo package is also in progress. The team at University of Nebraska-Lincoln (UNL) generated, curated, and transferred two large datasets consisting of matched genotype, transcript and phenotype datasets for flowering time in large maize and sorghum diversity panels to the statistics team at Iowa State University (ISU). By mining the literature, we generated a set of high confidence flowering time genes to use asground truth to evaluate model performance. For Objective #2, we have almost completed the development of a multi-locus GWAS method for ordinal traits. While we have yet to analyze real datasets from the different biology teams of ISU, we are testing our methodology on simulated data sets. Next, we will extend the methodology for combining GWAS, TWAS, and eQTL mapping of ordinal traits.

Publications

  • Type: Other Status: Submitted Year Published: 2024 Citation: Roy, V. (2024) A geometric approach to informed MCMC sampling, https://arxiv.org/abs/2406.09010
  • Type: Other Status: Submitted Year Published: 2024 Citation: Rao, Y. and Roy, V. (2024) Necessary and sufficient conditions for posterior propriety for generalized linear mixed models, https://arxiv.org/abs/2302.00665
  • Type: Book Chapters Status: Submitted Year Published: 2024 Citation: Roy, V., Khare, K., and Hobert, J. P. (2024) The data augmentation algorithm, https://arxiv.org/abs/2406.10464, Handbook of Markov chain Monte Carlo, 2nd Edition, Steve Brooks, Andrew Gelman, Galin L. Jones and Xiao-Li Meng eds., Chapman & Hall/CRC, to appear
  • Type: Other Status: Other Year Published: 2024 Citation: Escamilla, D.M., D. Li, K.L. Negus, K.L. Kappelmann, A. Kusmec, A.E. Vanous, P.S. Schnable, X. Li, and J. Yu*. Genomic selection: essence, applications, and prospects. Plant Genome. - in preparation