Progress 06/01/07 to 09/30/08
Outputs OUTPUTS: During the reporting year we developed a statistical methodology to incorporate biological information related to gene networks into the models for analyzing microarray gene expression data. Our approach, a multivariate normal prior on gene effects with a covariance structure derived from the network is used, such that there is some borrowing of information across genes when estimating differential expression (i.e. fold changes) and testing hypothesis regarding such changes. A software package written in R language was developed to implement the proposed methodology, as well as the traditional gene-specific analyses with shrinkage estimators of variance components. The software also performs leave-one-out cross-validation for the comparison of the alternative data analysis strategies. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project. He worked directly on the development of the methodology related to the incorporation of biological information into the models to analyze gene expression, and supervised the whole project. Mrs. Ana Ines Vazquez: Mrs. Vazquez is a PhD student at the University of Wisconsin, and she was responsible for writing the software package, and for implementing a first set of simulations and the cross-validation involving the real data. TARGET AUDIENCES: Anyone working with microarray experiments, as a tool to better understand and test hypotheses, regarding gene activity and regulation in animals under different environmental conditions or stress factors would be interested and would benefit from the models we are developing in our project. PROJECT MODIFICATIONS: Not relevant to this project.
Impacts Current methods for identifying differentially-expressed genes in microarray experiments typically treat genes as independent entities and do not take advantage of prior genomic or biological knowledge, such as common regulatory DNA sequence elements, gene function, or pathway membership. As a consequence, the resulting statistical tests have low power and fold change estimates have low precision. The methodology developed in this project, for the incorporation of gene network information into the statistical models were applied to a data set comprised of log ratio of 4609 gene expression of Saccharomyces cerevisiae, measured in absence and presence of a transcription factor (TF). A weighted average of three independent replicates of ratio of gene expression with and without TF of ChIP-on-chip were analyzed with and without a gene network estimated in an independent study. Results were evaluated with leave-one-out cross-validation. The proposed methodology resulted in a smaller variance estimates, and also allowed the identification and uncovering of hidden signals in the data. Results demonstrated that if the network information is reliable, incorporating it in the statistical analyses produce better estimates of differential expression. Using a network to construct a covariance structure for micorarray data, however, requires a careful choice of an appropriate kernel describing the relationships between genes. This topic deserves further research and it will be the focus of our research in the area of microarray data analysis.
Publications
- No publications reported this period
|
Progress 10/01/07 to 12/31/07
Outputs OUTPUTS: During the first three months of the project we analyzed various microarray gene expression data sets using existing statistical and computational approaches, and have also performed research on available bioinformatics tools for sequence analysis and gene annotation, which will be of primarily importance for the development of the proposed methodology. The next step of the project will be integration of biological knowledge into the statistical models for the analysis of expression profiling data.
PARTICIPANTS: PI: Dr. Guilherme J. M. Rosa Grad student: Ana Ines Vazquez
TARGET AUDIENCES: Anyone working with microarray experiments, as a tool to better understand and test hypotheses, regarding gene activity and regulation in animals under different environmental conditions or stress factors would be interested and would benefit from the models we'll be developing in our project.
Impacts Current methods for identifying differentially-expressed genes in microarray experiments typically treat genes as independent entities and do not take advantage of prior genomic or biological knowledge, such as common regulatory DNA sequence elements, gene function, or pathway membership. As a consequence, the resulting statistical tests have low power and fold change estimates have low precision. In this project we will develop a novel approach that uses sequence information to guide statistical analyses of microarray data. It is expected that such methodology will provide more powerful and reliable statistical analysis of microarray data.
Publications
- No publications reported this period
|
|