Source: UNIV OF WISCONSIN submitted to
ADVANCED TECHNOLOGIES FOR THE GENETIC IMPROVEMENT OF POULTRY
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0210569
Grant No.
(N/A)
Project No.
WIS01203
Proposal No.
(N/A)
Multistate No.
NC-1008
Program Code
(N/A)
Project Start Date
Jun 1, 2007
Project End Date
Sep 30, 2008
Grant Year
(N/A)
Project Director
Rosa, G.
Recipient Organization
UNIV OF WISCONSIN
21 N PARK ST STE 6401
MADISON,WI 53715-1218
Performing Department
DAIRY SCIENCE
Non Technical Summary
Microarray technology allows the monitoring of expression levels in cells for thousands of genes simultaneously. Microarrays have been increasingly used in agricultural research to study genetic mechanisms governing variation in traits of economic importance. As microarray experiments are expensive and time consuming, they are generally performed with a relatively small number of animals. Current statistical models used to analyze microarray data treat genes as independent entities so resulting statistical tests have low power and precision. This project aims to develop a novel microarray data analysis tool, using genomic and biological knowledge regarding the genes to improve the efficiency of microarray experiments.
Animal Health Component
(N/A)
Research Effort Categories
Basic
75%
Applied
25%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3033999108010%
3033999209010%
3043999108030%
3043999209050%
Goals / Objectives
The first objective is to develop high resolution integrated maps to facilitate the identification of poultry genes and other DNA sequences of economic importance. The next objective is to develop methods for locating new genetic variation in poultry by gene transfer and chromosome alteration. The last objective is to develop, compare, and integrate emerging technologies with classical quantitative genetics for improvement of economic traits in poultry.
Project Methods
Microarray technology has been increasingly used in agricultural research to study genetic mechanisms governing variation in traits of economic importance, such as disease resistance, growth, and reproduction. As microarray experiments are still expensive and time consuming, they are generally performed with a relatively small number of samples (animals). Nonetheless, microarray trials generate a massive amount of data, since thousands of genes are monitored simultaneously in each slide. Another feature of microarray assays is that they generally involve multiple sources of systematic effects such as variation among slides or differences caused by dye labeling of RNA samples. Hence, the analysis of such experiments requires statistical tools tailored to deal with data sets of unprecedented complexity and dimensionality. Specifically, for the comparison of expression profiles across groups or populations, mixed effects ANOVA models are the most popular, due to their flexibility and ease of use, as well as the availability of software for their implementation. Current ANOVA models, however, treat genes as independent entities, ignoring the natural covariance among their expression levels due to co-regulation processes. As a consequence of violating the independence assumption, the resulting statistical tests have low power and fold change estimates have low precision. Multivariate statistical models would be more appropriate for the joint analysis of multiple genes, but the much larger number of genes relative to the number of samples makes traditional whole-genome ANOVA models simply unfeasible. In this project we propose to develop a novel microarray data analysis methodology, which will take advantage of prior genomic and biological knowledge to improve the efficiency of microarray screening for differentially expressed genes. The approach will consist of two basic steps: firstly, publicly available genomic information, such as gene function and pathway membership, will be used to partition genes into subgroups of potentially co-regulated genes. Next, multivariate ANOVA analyses will be implemented within each subset of genes, and information on gene sequence similarities and common promoter elements will be utilized to develop parsimonious covariance structures among genes.

Progress 06/01/07 to 09/30/08

Outputs
OUTPUTS: During the reporting year we developed a statistical methodology to incorporate biological information related to gene networks into the models for analyzing microarray gene expression data. Our approach, a multivariate normal prior on gene effects with a covariance structure derived from the network is used, such that there is some borrowing of information across genes when estimating differential expression (i.e. fold changes) and testing hypothesis regarding such changes. A software package written in R language was developed to implement the proposed methodology, as well as the traditional gene-specific analyses with shrinkage estimators of variance components. The software also performs leave-one-out cross-validation for the comparison of the alternative data analysis strategies. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project. He worked directly on the development of the methodology related to the incorporation of biological information into the models to analyze gene expression, and supervised the whole project. Mrs. Ana Ines Vazquez: Mrs. Vazquez is a PhD student at the University of Wisconsin, and she was responsible for writing the software package, and for implementing a first set of simulations and the cross-validation involving the real data. TARGET AUDIENCES: Anyone working with microarray experiments, as a tool to better understand and test hypotheses, regarding gene activity and regulation in animals under different environmental conditions or stress factors would be interested and would benefit from the models we are developing in our project. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Current methods for identifying differentially-expressed genes in microarray experiments typically treat genes as independent entities and do not take advantage of prior genomic or biological knowledge, such as common regulatory DNA sequence elements, gene function, or pathway membership. As a consequence, the resulting statistical tests have low power and fold change estimates have low precision. The methodology developed in this project, for the incorporation of gene network information into the statistical models were applied to a data set comprised of log ratio of 4609 gene expression of Saccharomyces cerevisiae, measured in absence and presence of a transcription factor (TF). A weighted average of three independent replicates of ratio of gene expression with and without TF of ChIP-on-chip were analyzed with and without a gene network estimated in an independent study. Results were evaluated with leave-one-out cross-validation. The proposed methodology resulted in a smaller variance estimates, and also allowed the identification and uncovering of hidden signals in the data. Results demonstrated that if the network information is reliable, incorporating it in the statistical analyses produce better estimates of differential expression. Using a network to construct a covariance structure for micorarray data, however, requires a careful choice of an appropriate kernel describing the relationships between genes. This topic deserves further research and it will be the focus of our research in the area of microarray data analysis.

Publications

  • No publications reported this period


Progress 10/01/07 to 12/31/07

Outputs
OUTPUTS: During the first three months of the project we analyzed various microarray gene expression data sets using existing statistical and computational approaches, and have also performed research on available bioinformatics tools for sequence analysis and gene annotation, which will be of primarily importance for the development of the proposed methodology. The next step of the project will be integration of biological knowledge into the statistical models for the analysis of expression profiling data. PARTICIPANTS: PI: Dr. Guilherme J. M. Rosa Grad student: Ana Ines Vazquez TARGET AUDIENCES: Anyone working with microarray experiments, as a tool to better understand and test hypotheses, regarding gene activity and regulation in animals under different environmental conditions or stress factors would be interested and would benefit from the models we'll be developing in our project.

Impacts
Current methods for identifying differentially-expressed genes in microarray experiments typically treat genes as independent entities and do not take advantage of prior genomic or biological knowledge, such as common regulatory DNA sequence elements, gene function, or pathway membership. As a consequence, the resulting statistical tests have low power and fold change estimates have low precision. In this project we will develop a novel approach that uses sequence information to guide statistical analyses of microarray data. It is expected that such methodology will provide more powerful and reliable statistical analysis of microarray data.

Publications

  • No publications reported this period