Source: NORTH CAROLINA STATE UNIV submitted to
STUDY GENETIC BASIS AND PATHWAYS OF COMPLEX TRAITS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0220039
Grant No.
(N/A)
Project No.
NC02328
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Oct 1, 2009
Project End Date
Sep 30, 2014
Grant Year
(N/A)
Project Director
Zeng, ZH.
Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695
Performing Department
Statistics
Non Technical Summary
Mapping quantitative trait loci (QTL) is a statistical analysis that establishes evidence that a genomic region contains causal genes for a quantitative trait. The analysis can tell us how many regions in the genome that may harbor the causal genes for the trait, what effect of each region has on the trait, and how those regions might interact to affect the trait variation. QTL analysis is an important step that can lead to the identification and characterization of the trait genes. The project will produce a statistical analysis system that can be used to perform a comprehensive statistical analysis to map multiple QTL and identify QTL epistasis on multiple traits in multiple environments and cross populations from inbred lines. It can identify QTL based on their main and/or epistatic effects. It can map QTL that cause genetic correlations between traits and tell how much of the correlations due to pleiotyopy or linkage. It can analyze QTL from multiple populations and tell whether QTL identified from different traits or populations are the same QTL or different QTL in linkage. The method will be implemented in the widely used computer software, called Windows QTL Cartographer, and distributed freely to the scientific community for general usage of QTL mapping analysis. Since the initial release in 1995, QTL Cartographer has become the most widely used computer software for QTL mapping analysis.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
9017310108050%
9017310209050%
Goals / Objectives
Establishing causal relationships and pathways from genotypes to phenotypes is of fundamental importance to our understanding of the genetic basis of complex traits, and to many practical applications in human health, animal and plant breeding, and evolutionary studies. One important part of this relationship can be established first by mapping quantitative trait loci (QTL) in a population using genome-wide DNA polymorphisms. In QTL mapping analysis, we try to establish significant associations between phenotypes and genotypes at some specific genomic locations and, through this analysis, to study the effects and interaction of individual QTL. This estimation is necessary for us to understand the genetic basis of quantitative traits and diseases, to identify potential candidate genes for cloning and for further detailed biological studies and to introgress a QTL allele from one population to the other. The QTL mapping analysis has also been applied to whole genome gene expression data to map expression QTL (eQTL) and has become an important component for systems oriented studies that relate genome-wide DNA variations to gene expressions, to protein concentrations, to metabolite profiles and to clinical traits and phenotypes. This project will aim to develop a statistical analysis system that can map multiple QTL on multiple traits in multiple populations and environments. This is a continuation of our research efforts in the last 15 years. First we will study a three-stage search procedure for mapping multiple QTL based on their main and/or epistatic effects. This study will solve the problem of mapping epistatic QTL that have weak main effects, a main problem for QTL mapping analysis. Second, we will extend the analysis strategy to multiple traits and develop statistical methods that can analyze multiple QTL on multiple traits, can test pleiotropy vs. linkage on multiple traits, and can test QTL by environment interaction. Third, we will then further extend the methods to multiple populations that can combine information from multiple populations, studies and environments for a joint QTL analysis. This will increase statistical power of QTL detection, and resolve the issue whether detected QTL from multiple traits or multiple studies are the same QTL (pleiotropy) or different (due to linkage). The statistical methods will be based on the multiple interval approach we developed previously, and will be combined with a score-statistic resampling method for evaluating appropriate genome-wide thresholds in each step of QTL map procedures. The methods will be evaluated and tested extensively by simulation study, and implemented in Windows QTL Cartographer. The outputs of the project will be a set of improved statistical methods that can perform comprehensive QTL mapping analysis from crosses of inbred lines, delivered through computer software, Windows QTL Cartographer
Project Methods
The statistical methods will be based on the multiple interval mapping approach we developed previously. Multiple interval mapping is a maximum likelihood method that fits and tests multiple QTL with epistasis for a joint QTL mapping analysis. There are two problems in the current implemented approach. One is the lack of a procedure to produce appropriate criterion for evaluating a QTL in a complex model, and the other is the detection of epistatic QTL that have weak main effects. For the first problem, we will use a score-statistic resampling method that can efficiently produce appropriate criterion for testing QTL in a complex model. For the second problem, we will develop a three-stage search strategy for detecting QTL. In this first stage, QTL with main effects are scanned and fitted in the model sequentially; in the second stage, epistatic QTL that interacts with the identified main-effect QTL are scanned and fitted sequentially; and in the third stage, QTL with only epistatic effects are scanned in pairs and tested. In each step of the search process, a score-statistic resampling method will be used to produce appropriate genome-wide threshold for testing the new QTL conditional on pre-selected QTL. The methods and procedures will be evaluated and tested extensively by simulation. This method will then be extended to multiple traits to study the genetic basis of trait correlations. Appropriate search strategies and criteria for multiple QTL on multiple traits will be studied and tested by simulation. The main problem to be studied is again the model selection procedure and criterion. Also to be studied is the statistical test of pleiotropy vs. linkage and the test of QTL by environment interaction in a complex model setting. The methods will then be further extended to multiple populations, environments or studies for a joint QTL mapping analysis, for a comparative QTL analysis that tests whether detected QTL from different populations, environments or studies are the same or different, and for testing a host of hypotheses that concern the genetic basis and structure of QTL on multiple traits in multiple populations. The main problem here is sorting out the computation challenge and strategy for merging and testing models from different traits in different populations. Another main effort of the project is the implementation of the methods to Windows QTL Cartographer. The program will be tested and evaluated by simulation and will be freely released to the public for the general use of QTL mapping analysis.

Progress 10/01/09 to 09/30/14

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? The project trained several graduate students: 2009 Christine W. Duarte, PhD in Bioinformatics and Statistics, NC State Univ. “A new method for genetic network reconstruction in expression QTL data sets” 2010 Luciano Da Costa E Silva, PhD in Statistics, NC State University. “Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses” 2011 Hongjie Zhu, PhD in Bioinformatics and Statistics, NC State University. “Pharmacometabolomicsdata analysis and nonlinear sufficient dimension reduction for genome-scale studies” How have the results been disseminated to communities of interest? The results have been published in scientific journals and also implemented and distributed througha very popular computer software Windows QTL Carographer. The software is widely used in the scientificcommunity for performing statistical analysis of mapping quanitative trait loci.. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? 1. A major research result is the development of multiple interval mapping method for multiple traits in multiple populations (Silva, Wang and Zeng, 2012 BMC Genetics 13:67). Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software. Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of Drosophila dataset illustrates how the MTMIM method can better extract information from datasets with measurements in multiple traits. The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits. 2. The other major research result is the development of a new method targeted for mapping epistasis of QTL (Laurie et al. submitted). How to map quantitative trait loci (QTL) with epistasis efficiently and reliably has been a persistent problem for QTL mapping analysis. There are a number of difficulties for studying epistatic QTL. Linkage can impose a significant challenge for finding epistatic QTL reliably. If multiple QTL are in linkage disequilibrium and have interactions, searching for QTL can become a very delicate issue. A commonly used strategy that performs a two-dimensional genome scan to search for a pair of QTL with epistasis can suffer from low statistical power and also may lead to false identification due to complex linkage disequilibrium and interaction patterns. To tackle the problem of complex interaction of multiple QTL with linkage, we developed a three-stage search strategy. In the first stage, main effect QTL are searched and mapped. In the second stage, epistatic QTL that interact significantly with other identified QTL are searched. In the third stage, new epistatic QTL are searched in pairs. This strategy is based on the consideration that most genetic variance is due to the main effects of QTL. Thus by first mapping those main-effect QTL, the statistical power for the second and third stages of analysis for mapping epistatic QTL can be maximized. The search for main effect QTL is robust and does not bias the search for epistatic QTL due to the orthogonal property of genetic model used. The model search criterion is empirically and dynamically evaluated by using a score-statistic based resampling procedure. We demonstrate through simulations that the method has good power and low false discovery in the identification of QTL and epistasis. This method provides an effective and powerful solution to map multiple QTL with complex epistatic pattern. The method has been implemented in Windows QTL Cartographer. This will facilitate methodology application for QTL mapping data analysis 3. A new method for mapping gene expression QTL (Zou and Zeng 2009). To find the correlations between genome-wide gene expression variations and sequence polymorphisms in inbred cross populations, we developed a statistical method to claim expression quantitative trait loci (eQTL) in a genome. The method is based on multiple interval mapping (MIM), a model selection procedure, and uses false discovery rate (FDR) to measure the statistical significance of the large number of eQTL. We compared our method with a similar procedure proposed by Storey et al. and found that our method can be more powerful. We identified the features in the two methods that resulted in different statistical powers for eQTL detection, and confirmed them by simulation. We organized our computational procedure in an R package which can estimate FDR for positive findings from similar model selection procedures. 4. A method for inferring casual network from gene expression QTL mapping data (Duarte-Woods and Zeng 2011). Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL ‘‘hotspot,’’ and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of ‘‘shielding’’ in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae. Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments. 5. Windows QTL Cartographer V2.5 (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm). It has been a characteristic of our research in the last twenty years that most of our research results have been implemented and distributed through computer software QTL Cartographer and Windows QTL Cartographer. This software provides a platform for a comprehensive statistical analysis system for mapping QTL from crosses of inbred lines. It contains all the methods we have developed and some by others. It includes interval mapping, composite interval mapping, multiple interval mapping and Bayesian interval mapping for both continuously distributed and categorical data, for single and multiple traits, and for single and multiple populations. The new major updates in the last several years include multiple trait multiple interval mapping and a search strategy for mapping epistatic QTL.

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Aylor, D.L. and Z.-B. Zeng (2008) From classic genetics to quantitative genetics to systems biology: modeling epistasis. PLoS Genetics 4(3). Garcia, A. A. F., S. Wang, A. E. Melchinger and Z.-B. Zeng (2008) Quantitative trait loci mapping and the genetic basis of heterosis in maize and rice. Genetics 180:1707-1724. Zou, W. and Z-.B Zeng (2008) Statistical methods for mapping multiple QTL. International Journal of Plant Genomics (article ID 286561). Zou, W. and Z.-B. Zeng (2009) Multiple interval mapping for gene expression QTL analysis. Genetica 137: 125-134. Wang, T. and Z.-B. Zeng (2009) Contribution of genetic effects to genetic variance components with epistasis and linkage disequilibrium. BMC Genetics 10:52 E Silva, L.D.C, and Z.-B. Zeng (2010) Current progress on statistical methods for mapping quantitative trait loci from inbred line crosses. Journal of Biopharmaceutical Statistics, 20: 2, 454-481. Duarte-Woods, CW and Z.-B. Zeng (2011) High-confidence discovery of genetic network regulators in expression quantitative trait loci. Genetics 187:955-964. E Silva, L.D.S., S.C. Wang and Z.-B. Zeng (2012) Composite interval mapping and multiple interval mapping: procedures and guidelines for using Windows QTL Cartographer. In: Quantitative Trait Loci (QTL): Methods and Protocols Scott A. Rifkin (Ed). Methods in Molecular Biology Series, Vol. 871, Springer Protocols, Humana Press, New York, p. 75-119. Silva, L.D.E.; S.C. Wang and Z.-B. Zeng (2012) Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses. BMC GENETICS 13:67 (DOI: 10.1186/1471-2156-13-67). Laurie, C., S. Wang, L.A. Carlini-Garcia and Z.-B. Zeng (2013) Mapping epistatic quantitative trait loci. (submitted) Wang, S., C. Basten and Z.-B. Zeng (2012) WINDOWS QTL Cartographer. Department of Statistics, North Carolina State University, Raleigh, NC