Source: NORTH CAROLINA STATE UNIV submitted to NRP
ESTIMATING GENETIC ARCHITECTURE OF QUANTITATIVE TRAITS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
0200050
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2004
Project End Date
Sep 30, 2009
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695
Performing Department
STATISTICS
Non Technical Summary
In this project we develop statistical methods and computer programs for mapping quantitative trait loci (QTL) in a population and estimates the genetic effect network between QTL, gene expression profiles and quantitative traits. We also study the robustness and limitations of the methods for an informed inference. The methods and programs will help us to understand the genetic basis of quantitative trait variation.
Animal Health Component
(N/A)
Research Effort Categories
Basic
50%
Applied
(N/A)
Developmental
50%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
9017310108050%
9017310209050%
Goals / Objectives
Establishing a causal relationship between genotypes and phenotypes is of fundamental importance to our understanding of genetic basis of quantitative genetics and to many practical applications including animal and plant breeding. Mapping quantitative trait loci (QTL) is the first step toward the functional inference of this relationship. This relationship is called the genetic architecture of quantitative traits, and is specified by the parameters that include the number and genomic locations of QTL, frequencies and associations, effects and interaction of QTL alleles, QTL by environment interaction, and other parameters linking genotypes to phenotypes. The general objective of this project is to develop theories and statistical methods to characterize and analyze the genetic basis and structure of quantitative trait variation in experimental and natural populations using genomic data. The proposed research has three parts. The first part is to further study statistical issues related to multiple interval mapping for mapping quantitative trait loci (QTL), such as model selection criteria and practical meaning and interpretation of mapping results. We also propose to extend multiple interval mapping to multiple traits in multiple environments and populations and to categorical traits. We will study issues related to model QTL effects in a disequilibrium population. The second part is to develop statistical methods to relate molecular marker data, microarray gene expression data and quantitative trait data together for gene expression QTL analysis and for constructing genetic effect network from QTL to gene expression profiles to quantitative trait phenotypes. The third part is to explore statistical methods for a potential genome-wide inference of relationship between genotypes and phenotypes using single nucleotide polymorphisms (SNP) in a natural random mating population. The main emphasis of research is to take the overall structure of quantitative trait variation in a population as an entity in mapping individual QTL and in studying effects, epistasis and pleiotropy of QTL, genetic correlations between traits, QTL by environment interaction, and genetic effect network from QTL to gene expression to quantitative traits. Many statistical methods will be implemented in QTL Cartographer, which is the computer software we developed and widely used in the scientific community.
Project Methods
We will extend multiple interval mapping we developed for mapping multiple QTL on a trait to multiple traits in multiple environments and to a categorical trait, and implement the developed methods to QTL Cartographer. We will study many issues related to the statistical inference of genetic architecture of quantitative traits in QTL study, such as robustness of model selection and appropriate interpretation of mapping analysis on the number, positions, effects and interaction of QTL identified. On gene expression QTL analysis, we will explore a number of statistical methods to link genotype data, gene expression data and phenotypic trait data together through a genetic effect network from QTL to gene expression phenotypes to quantitative trait phenotypes. We will develop, implement and compare two complementary approaches for expression QTL (eQTL) mapping analysis. One is based on QTL analysis on principal components of a cluster of highly correlated gene expression profiles to identify candidate QTL that co-regulate expressions of the cluster genes. This could serve as an initial approach for QTL identification. The other is based on multiple trait multiple interval mapping to estimate detail of the genetic effect network for a set of gene expression profiles and/or trait phenotypes. We will also develop a regression-based method to associate multiple gene expressions to quantitative traits and combine it with a multiple interval mapping on traits. We will explore the possibility to use single nucleotide polymorphisms (SNP) to directly infer the genetic architecture of a quantitative trait in a natural random mating population. We will study issues on modelling QTL with epistasis and linkage disequilibrium (LD) in a natural population, and study statistical methods to infer linkage disequilibrium relationship among a set of markers and multiple QTL and to estimate the genetic architecture parameters of a quantitative trait in a natural population, including frequencies, effects, interaction and LD of QTL.

Progress 10/01/06 to 09/30/07

Outputs
OUTPUTS: Two projects were conducted in the past year. One is on measuring and partitioning the high order linkage disequilibrium by multiple order Markov chains. A map of the background levels of disequilibrium between nearby markers can be useful for association mapping studies. In order to assess the background levels of linkage disequilibrium (LD), multilocus LD measures are more advantageous than pairwise LD measures because the combined analysis of pairwise LD measures is not adequate to detect simultaneous allele associations among multiple markers. Various multilocus LD measures based on haplotypes have been proposed. However, most of these measures provide a single index of association among multiple markers and does not reveal the complex patterns and different levels of LD structure. In this study we employ non-homogenous, multiple order Markov chain (MOMC) models as a statistical framework to measure and partition the LD among multiple markers into components due to different orders of marker associations. Using a sliding window of multiple markers on phased haplotype data, we compute corresponding likelihoods for different MC orders in each window. The log-likelihood difference between the lowest MC order model (MC0) and the highest MC order model in each window is used as a measure of the total LD or the overall deviation from the gametic equilibrium for the window. Then, we partition the total LD into lower order disequilibria and estimate the effects from two-, three-, and higher order disequilibria. The relationship between different orders of LD and the log-likelihood difference involving two different orders of MC models are explored. By applying our method to the phased haplotype data in the ENCODE regions of the HapMap project, we are able to identify high/low multilocus LD regions. Our results reveal that the most LD in the HapMap data is attributed to the LD between adjacent pairs of markers across the whole region. We also find that as the multilocus total LD increases, the effects of high order LD tends to get weaker due to the lack of observed multilocus haplotypes. The overall estimates of first, second, third, and fourth order LD across the ENCODE regions are 64%, 23%, 9%, and 3%. The other project is a collaborative effort to develop a semiparametric method for composite functional mapping of dynamic quantitative traits. In this study, we present a statistical framework for mapping QTL that affect dynamic traits. Functional mapping models the time-dependent genetic effects of a QTL tested within a marker interval using a biologically meaningful parametric function, whereas composite interval mapping models the time-dependent genetic effects of the markers outside the test interval to control the genome background using a flexible nonparametric approach based on Legendre polynomials. Such a semiparametric framework was formulated by a maximum-likelihood model and implemented with the EM algorithm, allowing for the estimation and the test of the mathematical parameters that define the QTL effects and the regression coefficients of the Legendre polynomials that describe the marker effects. PARTICIPANTS: Zhao-Bang Zeng, Bioinformatics Research Center, North Carolina State University. Yunjung Kim, Bioinformatics Research Center, North Carolina State University. Sheng Feng, Department of Biostatistics and Bioinformatics, Duke University Runqing Yang, Huijiang Gao, Xin Wang, and Ji Zhang, School of Agriculture and Biology, Shanghai Jiaotong University, China. Rongling Wu, Department of Statistics, University of Florida at Gainesville Sheng Feng is a former graduate student of Zhao-Bang Zeng. Yungjung Kim is a current graduate student of Zhao-Bang Zeng. TARGET AUDIENCES: The study is targeted to geneticists and animal and plant breeders. It can also help to motivate statistical geneticists and statisticians to pursue further study. PROJECT MODIFICATIONS: None.

Impacts
Using multiple order Markov Chain (MOMC) to partition and measure different level of linkage disequilibrium (LD) is a novel idea. It is the first practical method to tell us how much the total LD in a region can be attributed to two-locus LD, how much due to three-locus LD, and so on. Applied to the HapMap data in human, it tells us for the first time where the high order LD regions are located. It could lead to our biological understanding about what might cause it. The use of MOMC model can also help us to better associate disease gene variation to multiple markers in a genome area. The study on the composite interval mapping method for dynamic trait has the potential to provide a practical solution to a complex problem.

Publications

  • Yang, R., H. Gao, X. Wang, J. Zhang, Z.-B. Zeng and R. Wu (2007) A semiparametric approach for composite functional mapping of dynamic quantitative traits. Genetics 177: 1859-1870.
  • Kim, Y., S. Feng and Z.-B. Zeng (2007) Measuring and partitioning the high order linkage disequilibrium by multiple order Markov chains. Genetic Epidemiology (in press).


Progress 10/01/05 to 09/30/06

Outputs
We made significant progress in several directions. We have developed new theory for modeling QTL with epistasis and linkage disequilibrium and provided some new perspectives in interpreting QTL models, particularly in relation to epistasis and linkage disequilibrium. We have also developed a new multiple interval mapping method for categorical trait (CT-MIM). CT-MIM has been implemented and released in the current version of Windows QTL Cartographer (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm). In relation to the research on gene expression QTL mapping data analysis, we have worked out a few new ideas and procedures to help to interpret the mapping results. Prioritize candidate genes in eQTL regions: There are several ways to annotate the information of eQTL mapping results. One way is to numerically prioritize the genes in each eQTL gene list as potential causal genes for the eQTL. Recently, we have attempted to develop a Bayesian algorithm for such a purpose. The algorithm first uses the information from prior studies and annotations on gene relationships (such as GO classifications, KEGG relationships, chip-on-chip study information) and weighs the information to come up a raw gene-pair relationship score matrix. Then the algorithm uses this relationship score matrix as a prior and combines it with the whole gene pair information from the eQTL mapping study in a repeated recursive re-weight scheme to provide the final priority scores. Information can be reinforced particularly for those eQTL that have effects on multiple expression traits. Although this information is only suggestive, it can play a very important role for a variety of applications in advancing testable hypotheses. MT-MIM for eQTL mapping: We are currently working on adapting our multiple trait multiple interval mapping (MT-MIM) for eQTL analysis. This is a very complex procedure and can be used for several purposes. First, it can be used to estimate the contribution of individual eQTL to genetic correlations between expression traits. Second, MT-MIM has the potential to further improve the search for eQTL for a pair or multiple expression traits simultaneously. The main research is still in progress. Using the QTL shielding test (QST) to infer genetic pathways: For the potential pathway inference, we recently worked out a QTL shielding test (QST) that tries to infer whether the relationship between a QTL and multiple (expression and/or trait) phenotypes can be described by a pathway network or a star network. The test focuses on a particular QTL at a time, say Q, and two or more expression and trait phenotypes that share the same QTL, say Y1 and Y2, to see whether Y1 can shield the effect of Q on Y2, i.e. whether Y1 is in the pathway from Q to Y2. We studied various statistical issues for the test, and showed that the test works very effectively for one Q and two Y{!/}s. Currently we are working on extending the test and analysis to multiple Q{!/}s and Y{!/}s. This is a very promising approach to robustly infer sub-networks that have relatively strong causal relationships and pathway structures for eQTL and their target traits.

Impacts
The theoretical study on modeling QTL provides the basis and framework to estimate and interpret the genetic basis of quantitative trait variation. It is important for our understanding and appropriate interpretation of genetic effects of multiple interactive QTL when both epistasis and linkage disequilibrium are present in a mapping population. Our study resolved a number of long-standing problems and wide-spread confusions and misunderstandings in quantitative genetics community, such as the misunderstanding that QTL effects estimated from some models can be independent from the study population. This is also the first study that takes both epistasis and linkage disequilibrium into account in a systematical study of modeling QTL. The study on eQTL analysis, gene annotations and prioritization, and network inference are pioneer research, will have very significant impact for future developments in these areas, and lay some ground works towards to system oriented genetic inference.

Publications

  • Li, J., S. Wang and Z.-B. Zeng (2006) Multiple interval mapping for ordinal traits. Genetics 173: 1649-1663.
  • Wang, T., and Z.-B. Zeng (2006) Modeling quantitative trait loci with epistasis and linkage disequilibrium in experimental and natural populations. BMC Genetics 7:9.


Progress 10/01/04 to 09/30/05

Outputs
Progress is made on two parts. In the first part, we have systematically studied a number of theoretical and statistical issues in relation to modeling quantitative trait loci in cross population and natural population. A quantitative genetic model relates the genotypic value of an individual to the alleles at the loci that contribute to the variation in a population in terms of additive, dominance, and epistatic effects. This partition of genetic effects is related to the partition of genetic variance. It is the basis to analyze and interpret the genetic basis of quantitative trait variation. A number of models have been proposed to describe this relationship: some are based on the orthogonal partition of genetic variance in an equilibrium population. We (Zeng, Wang and Zou, 2005) compared a few representative models and discuss their utility and potential problems for analyzing quantitative trait loci (QTL) in a segregating population. An orthogonal model implies that estimates of the genetic effects are consistent in a full or reduced model in an equilibrium population and are directly related to the partition of the genetic variance in the population. Linkage disequilibrium does not affect the estimation of genetic effects in a full model, but would in a reduced model. Certainly linkage disequilibrium would complicate the detection of QTL and epistasis. Using different models does not influence the detection of QTL and epistasis. However, it does influence the estimation and interpretation of genetic effects. We studied extensively the composition and property of the genetic model parameters, such as genetic effects and partition of genetic variance, when both epistasis and linkage disequilibrium are considered. This would help us to understand the structure of genetic parameters and relationship of various genetic quantities, such as allelic frequencies and linkage disequilibrium, on the definition of genetic effects, and will also help us to understand and properly interpret estimates of the genetic effects and variance components in a QTL mapping experiment. The second part is the study of Liu and Zeng (2005). In this study, a set of mixture model equations was derived based on the normal mixture model and the EM algorithm for evaluating linear models with uncertain independent variables. The derived equations can be seen as an extension of Hendersons mixed model equations to mixture models and provide a general framework to deal with the issues of uncertain incidence matrices in linear models. The mixture model equations were applied to marker-assisted genetic evaluation with different parameterizations of QTL effects. The mixed-effect mixture model equations are flexible in modelling QTL effects and show desirable properties in estimating QTL effects, compared with Hendersons mixed model equations.

Impacts
The theoretical study on modeling QTL provides the basis and framework to estimate and interpret the genetic basis of quantitative trait variation. It is important to understand and appropriately estimate the genetic effects of multiple interactive QTL when both epistasis and linkage disequilibrium are present in a mapping population. Our study resolved a number of long-standing problems and wide-spread confusions and misunderstandings in quantitative genetics community, such as the misunderstanding that QTL effects estimated from some models can be independent from the study population. This is also the first study that takes both epistasis and linkage disequilibrium into account in a systematical study of modeling QTL.

Publications

  • Zeng, Z.-B., T. Wang and W. Zou (2005) Modeling quantitative trait loci and interpretation of models. Genetics 169:1711-1725.
  • Liu, L. and Z.-B. Zeng (2005) Mixture model equations for marker-assisted genetic evaluation. J. Anim. Breed. Genet. 122: 229-239.