Source: CORNELL UNIVERSITY submitted to
FROM MOLECULAR ADAPTATION TO PROTEIN FUNCTION: NEW TOOLS FOR GENOME ANALYSIS
Sponsoring Institution
State Agricultural Experiment Station
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0192387
Grant No.
(N/A)
Project No.
NYC-150303
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Dec 1, 2001
Project End Date
Nov 30, 2004
Grant Year
(N/A)
Project Director
Nielsen, R.
Recipient Organization
CORNELL UNIVERSITY
(N/A)
ITHACA,NY 14853
Performing Department
BIOLOGICAL STATISTICS & COMPUTATIONAL BIOLOGY
Non Technical Summary
New bioinformatical approaches to the study of evolution and gene function will be developed and implemented. The methods will be applied to various biological systems including DNA sequences from viruses and insect pests.
Animal Health Component
(N/A)
Research Effort Categories
Basic
50%
Applied
(N/A)
Developmental
50%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
90173101080100%
Goals / Objectives
The research objectives of the project and the responsible team members are as follows: 1. Development of more realistic models of codon substitution to detect adaptive evolution along lineages at individual sites, by Drs. Yang and Nielsen. 2. Computer simulation to evaluate performance of detection methods, by Drs. Yang and Nielsen. 3. Development of methods for predicting viral epitopes, by Drs. Nielsen and Yang. 4. Analysis of large data sets, by Drs. Schmid, Yang, Nielsen. - Identification of viral epitopes and amino acid sites under positive selection in viral sequence data. - Large-scale database searches for positive selection to identify groups of proteins evolving under positive selection. - Analysis of multiple families of duplicated genes to understand functional divergence after duplication. 5. Analysis of positive selection in rapidly evolving genes by DNA sequencing of homologous genes, by Dr. Schmid - Nonconserved orphan proteins in Arabidopsis and Drosophila. - Positive selection in proteins of an insect pest. Development of computer software, by Drs. Yang and Nielsen.
Project Methods
Protein domains under positive selection must be functionally important and may be responsible for functional differences between individuals and species. New likelihood and Bayesian methods for detecting positive Darwinian selection in protein-coding genes will be developed and applied to detect functionally important protein domains and amino acid sites. The nonsynonymous/synonymous substitution rate ratio (w = dN/dS) will be used as a measure of selective pressure on the protein. By identifying regions or sites in a protein with a w greater than 1, genes or sites in a gene with adaptive significance can be detected. Currently existing statistical methods will be extended to allow the selective pressure (w) to vary both among amino acid sites and among evolutionary lineages. Computer simulations will be performed to evaluate the utility and statistical properties of both new and old methods. The new methods will be implemented and made publicly available in the PAML and other program packages. The new methods will be used in combination with bioinformatics approaches and DNA sequencing to identify novel, positively selected proteins from various organisms. This will include the analysis of viral sequences to detect novel epitopes that interact with the host immune system, a large scale analysis of available sequences of proteins and protein families to elucidate the genome-wide role of positive selection in protein evolution and a test of the hypothesis that many genes of unknown function are rapidly evolving due to positive selection. The proposed work is complementary to traditional functional and structural genomics approaches and has the potential to rapidly lead to the identification of proteins of medical or agricultural importance.

Progress 12/01/01 to 11/30/04

Outputs
The research conducted under this grant has lead to multiple publications on statistical methods in molecular evolution and population genetics, and to several important application papers. A major focus of this work is to generalize our previously published method used for detecting positively selected sites to more biologically relevant models. First, a Bayesian approach for analyzing the pattern of nonsynonymous and synonymous mutations was been developed. This methods was published in Genetics where the problem of estimating the ages of nonsynonymous and synonympus mutations was discussed We have also developed several new methods for analyzing nonsynonymous and synonymous mutations in a likelihood framework. A paper published in Molecular Biology and Evolution describes how inferences regarding positive selection can be performed on the basis of a phylogeny allowing variation in the strength of selection both among lineages in the phylogeny and among sites in the DNA sequence. In an application of this work, together with collaborators, we analyzed human. Chimpanzee and mouse data to detect selection in humans. This study was published in Science. A later study on similar problems is in press in PLoS Biology. We also worked with collaborators on applying the methods to a group of carnivorous plants. This study was published in PNAS. Several other application papers have been published in other journals. Other papers concentrate on improving the methods used to detect selection in specific sites. We have also conducted large scale simulation studies to examine how robust the methods are to factors such as recombination and the presence of a codon usage bias. Finally, we have published a paper on estimating the distribution of selection coefficients from comparative data as described in the original grant proposal. We believe that this body or work is providing a general framework for analyzing hypotheses regarding the action of selection from comparative data. The impacts of this research program is illustrated by the large number of citations to it in high profile journals such as PNAS, Science and Nature.

Impacts
The developed methods provides an integrative framework for analyzing hypotheses regarding the action of Darwinian selection at the level of DNA sequences. The methods have obvious applications in studies of molecular evolution, but they may also be applied for elucidating the immunological interactions between viruses and their infected hosts.

Publications

  • Richards, S., Y. Liu, B. R. Bettencourt, P. Hradecky, S. Letovsky, R. Nielsen et al. 2005. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Research 15: 1-18.
  • Nielsen, R and M. J. Hubitz. 2005. Comparative data is needed to detect selection. Nature 433: E6 (Brief communication arising).
  • Wong, W.H. and R. Nielsen. 2004. Detecting positive selection in non-coding regions of DNA sequences. Genetics 167: 949-958.
  • Wong, W., Z. Yang, N. Goldman, and R. Nielsen. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168: 1041-1051
  • Jobson R. W., R. Nielsen, L. Laakkonen, M. Wikstrom and V. A. Albert. 2004. Adaptive evolution of cytochrome c oxidase: infrastructure for a carnivorous plant radiation. Proc. Natl. Acad. Sci. 101: 18064-18068.
  • Clark, A. G., R. Nielsen, J. Signorovitch, T. C. Matise, S. Glanowski,, J. Heil, E. S. Winn-Deen, A. L. Holden, and E. Lai. 2003. Positive selection in the human genome inferred from human-chimp-mouse orthologous gene alignments. Cold Spring Harbor Symposia on Quantitative Biology 68: 471-477.
  • Nielsen, R. and Akashi, H. 2003. Action of purifying selection on silent sites in the human genome. In The Encyclopedia of the Human Genome, Nature Publishing Group.


Progress 01/01/03 to 12/31/03

Outputs
The research conducted so far has lead to multiple publications on statistical methods in molecular evolution and population genetics. In particular a Bayesian approach for analyzing the pattern of nonsynonymous and synonymous mutations has been developed. This methods has been published in Genetics where the problem of estimating the ages of nonsynonymous and synonympus mutations was discussed. In an article in Science an illustration of the method on the hemagglutinin molecule from the influenza virus was published. Currently, we have two articles in press and one in review on the topic of analyzing nonsynonymous and synonymous rates of substitution. A major focus of this work is to generalize our previously published method used for detecting positively selected sites to more biologically relevant models. For example, a current paper in press in Molecular Biology and Evolution describes how inferences regarding positive selection can be performed on the basis of a phylogeny allowing variation in the strength of selection both among lineages in the phylogeny and among sites in the DNA sequence. We are also currently conducting large scale simulation studies to examine how robust the methods are to factors such as recombination and the presence of a codon usage bias. Finally, we have submitted a paper on estimating the distribution of selection coefficients from comparative data as described in the original grant proposal. We believe that this body or work is providing a general framework for analyzing hypotheses regarding the action of selection from comparative data. The impacts of this research program is illustrated by the large number of citations to it in high profile journals such as PNAS, Science and Nature.

Impacts
The developed methods provides an integrative framework for analyzing hypotheses regarding the action of Darwinian selection at the level of DNA sequences. The methods have obvious applications in studies of molecular evolution, but they may also be applied for elucidating the immunological interactions between viruses and their infected hosts.

Publications

  • Nielsen, R. and Yang, Z. 2003. Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol. Biol. Evol. 20(8):231-1239.
  • Swanson, W.J., Nielsen, R. and Yang, Q. 2003. Pervasive Adaptive Evolution in Mammalian Fertilization Proteins. Mol. Biol. Evol. 20(1):18-20.
  • Bustamante, C.D., Nielsen, R. and Hartl, D.L. 2003. Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data. Theor. Pop. Biol. 63(2):91-103.
  • Anisimova, M., Nielsen, R. and Yang, Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164(3):1229-1236.
  • Clark, A. G. , Glanowski, S. Nielsen, R. et al. 2003. Inferring non-neutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960-1961.


Progress 01/01/02 to 12/31/02

Outputs
The research conducted so far has lead to multiple publications on statistical methods in molecular evolution and population genetics. In particular a Bayesian approach for analyzing the pattern of nonsynonymous and synonymous mutations has been developed. This methods has been published in Genetics where the problem of estimating the ages of nonsynonymous and synonympus mutations was discussed. In an article in Science an illustration of the method on the hemagglutinin molecule from the influenza virus was published. Currently, we have two articles in press and one in review on the topic of analyzing nonsynonymous and synonymous rates of substitution. A major focus of this work is to generalize our previously published method used for detecting positively selected sites to more biologically relevant models. For example, a current paper in press in Molecular Biology and Evolution describes how inferences regarding positive selection can be performed on the basis of a phylogeny allowing variation in the strength of selection both among lineages in the phylogeny and among sites in the DNA sequence. We are also currently conducting large scale simulation studies to examine how robust the methods are to factors such as recombination and the presence of a codon usage bias. Finally, we have submitted a paper on estimating the distribution of selection coefficients from comparative data as described in the original grant proposal. We believe that this body or work is providing a general framework for analyzing hypotheses regarding the action of selection from comparative data. The impacts of this research program is illustrated by the large number of citations to it in high profile journals such as PNAS, Science and Nature.

Impacts
The developed methods provides an integrative framework for analyzing hypotheses regarding the action of Darwinian selection at the level of DNA sequences. The methods have obvious applications in studies of molecular evolution, but they may also be applied for elucidating the immunological interactions between viruses and their infected hosts.

Publications

  • Bustamente, C.D., Nielsen, R., Sawyer, S.A., Olsen, K.M., Purugganan, M.D. and Hartl, D.L. 2002. The cost of inbreeding in Arabidopsis. Nature 416: 531-534.
  • Bustamente, C. D., Nielsen, R. and Hartl, D.L. 2002. A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 16: 110-117.
  • Nielsen, R. and Huelsenbeck, J.P. 2002. Detecting positively selected sites using posterior predictive p-values. Pac. Symp. Biocomp. 2002 (Eds. R. B. Altman, K. Dunker, L. Hunter, K. Lauderdal and T. E. Klein), pp. 576-578.
  • Yang, Z and Nielsen, R. 2002. Codon Substitution Models for Detecting Molecular Adaptation at Individual Sites Along Specific Lineages. Mol. Biol. Evol. 19: 908-17.
  • Nielsen, R. 2002. Mapping Mutations on Genealogies. Syst. Biol. 51: 729-739.
  • Cai S, Kabuki, D.Y., Kuaye, A.Y., Cargioli, T.G., Chung, M.S., Nielsen, R. and Wiedmann, M. 2002. Rational Design of DNA Sequence-Based Strategies for Subtyping Listeria monocytogenes. J. Clin. Microbiol. 40:3319-3325.
  • Knipple, D.C., Rosenfield, C.L, Nielsen, R., You, K.M., and Jeong, S.E. 2002. Evolution of the Integral Membrane Desaturase Gene Family in Moths and Flies Genetics 162: 1737-1752.