Source: UNIV OF MASSACHUSETTS submitted to NRP
CONSTRUCTING GRAPHICAL MODELS OF PHOTOSYNTHETIC ORGANISMS FROM GENOMIC DATA TO DETECT CLIMATICALLY INDUCED CHANGES IN CELLULAR PATHWAYS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0201574
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2004
Project End Date
Sep 30, 2009
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIV OF MASSACHUSETTS
(N/A)
AMHERST,MA 01003
Performing Department
MICROBIOLOGY
Non Technical Summary
How will photosynthetic organisms, including crop species, respond? This specific project will advance our ability to predict the influence of climatic effects on molecular processes, including carbon sequestration, thereby enhancing our overall capacity to use molecular data to guide future agricultural practices and the decision making process affecting global climate change.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20140101103100%
Goals / Objectives
The long-term goal of our research is to integrate large-scale genomic data sets into probabilistic models that allow inferences and decisions regarding the trajectory of photosynthetic organisms in a changing global climate. The objectives of this particular proposal are to construct graphical models from sequence and laboratory expression data to discover relationships among genes and proteins that have responded to past climatic changes and to laboratory manipulated environments. To meet these objectives we have the following three specific goals: 1. Construct comprehensive protein relationship graphs of cyanobacteria and chloroplasts. Working hypothesis: The cellular signaling, regulatory and metabolic networks of Prochlorococcus marinus MED4 are converging on minimal structure similar in many ways to that of chloroplasts. 2. Derive Bayesian networks from gene expression data. Working hypothesis: By analyzing changes in expression levels in response to changes in culture environment either alone or constrained by the protein relationship graphs we can detect dependencies among proteins of unknown function. 3. Identify the sets of genes derived from horizontal gene transfer. Working hypothesis: Selective pressure in response to past changes in ocean chemistry is leading to horizontal gene transfer events that result in no net change in the primary enzyme activity, but are rewiring post-transcriptional regulatory networks.
Project Methods
Constructing graphical models of cyanobacteria and chloroplasts In order to understand how cellular signaling, regulatory and metabolic networks are changing it is necessary to have comprehensive graph-based models of cyanobacterial and chloroplast networks. We have used a graph-like illustration generated by hand in Figure 1 to communicate our hypothesis that nonthioredoxin-regulated proteins are replacing the thioredoxin-regulated proteins. In order to be more systematic in our approach the sequence annotation regarding protein function and interactions resulting from the four completed and two pending Prochlorococcus and Synechococcus genome projects as well as completed Arabidopsis chloroplast and nuclear genomes will be imported in an open source derivative of PathDB. Deriving Bayesian Networks from gene expression data. Graphical models represent a union between probability theory and graph theory and thus represent a natural extension of our work on relationship graphs. Directed graphical models, also known, as Bayesian Networks are popular with the Artificial Intelligence and machine learning communities. Bayesian network have recently become popular in medical informatics in part because they are capable of representing the causal relationships between variables in large data sets, instead of simply the correlations between variables. We will use the open source Matlab module developed by Kevin Murphy (http://www.ai.mit.edu/~murphyk/Software/BNT/bnt.html) for our analysis and follow procedures established by Nir Friedman for deducing relationships among genes in yeast (Pe'er et al. 2001; Segal et al. 2003). From either the gene expression data alone or the combined data we expect to be able to extract subnetworks important for adapting to a changing environment, particular those pathways relevant to carbon sequestration. Detecting horizontal gene transfer Horizontal gene transfer can be difficult to detect and interpretations can often be reversed when new sequence data is added. Fortunately the four complete genome sequences from the Prochlorococcus/Synechococcus group and complete genomic sequences from at least six other cyanobacteria outside this group that will greatly simplify our analysis. We will start with the set of predicted proteins in each completely sequence Prochlorococcus and Synechococcus genome and run searches using BLASTP against the NCBI database. We will then overlay the horizontal gene transfer events on the graphical models described above and extract groups of connected nodes. Based our findings shown in Figures 1 and 2, we expect to be able to more readily interpret the pattern of horizontal gene transfer when these events are placed in the context of a larger model. Atmospheric CO2 levels are rapidly rising coincident with upward trends in fossil fuel consumption.

Progress 10/01/04 to 09/30/09

Outputs
OUTPUTS: Presentations of results from this research project were given to research scientists at the Gordon Research Conference on Microbial Population Biology, The Society for the Study of Evolution, the International Conference on Systems Biology, New England Molecular Evolutionary Biologists Meeting, International Conference on Bioinformatics. The material was also the subject of invited talks at the University of Connecticut, Massachusetts Institute of Technology, Boston University, and the University of Rhode Island. The software developed is freely available upon request. PARTICIPANTS: Jinghua Hu, PhD student, University of Massachusetts Zhiyi Sun, PhD student, University of Massachusetts TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Global climate change is an international problem that already is impacting the evolutionary trajectory of our planet's biota. In spite of the widely appreciated magnitude of this problem, we still have a limited ability to estimate current and long-term biological effects. To test whether the high-light adapted Prochlorococcus MED4 is experiencing a reduction in selection efficiency resulting from genetic drift, we examine two data sets, namely, the environmental genome shotgun sequencing data from the Sargasso Sea and a set of cyanobacterial genome sequences. After integrating these data sets, we compare the evolutionary profile of a high-light Prochlorococcus group to that of a group of Synechococcus (a closely related group of marine cyanobacteria) that does not exhibit a similar small-genome syndrome. The average pairwise dN/dS ratios in the high-light adapted Prochlorococcus group are significantly lower than those in the Synechococcus group, leading us to reject the hypothesis that the Prochlorococcus group is currently experiencing higher levels of genetic drift. We then reconstructed the steps leading to Prochlorococcus genome reduction from the analysis of 12 Prochlorococcus and 5 marine Synechococcus strains. Prochlorococcus had significantly more gene loss than Synechococcus on average. We show that small genome size of Prochlorococcus was largely determined by massive loss of small effect genes shortly after the split of Prochlorococcus and Synechococcus. Genes were lost from nearly all the functional categories, and on average lost genes had higher dN/dS ratios and lower codon adaptation index values than retained genes in every functional category, suggesting that genes with small selection coefficients were lost in the Prochlorococcus. These results imply that the majority of genes were deleted in a neutral or nearly neutral manner and that gene loss was not responsible for the emergence or the divergence of the two modern Prochlorococcus ecological groups. In addition, we have developed a novel approach to identify co-regulated genes using multiple gene-specific regulatory sequence motifs. Compared to existing approaches, our method of grouping genes based on multiple motifs improves predictions of co-regulation.

Publications

  • Richards TA, Dacks JB, Campbell SA, Blanchard JL, Foster PG, McLeod R, Roberts CW. Evolutionary origins of the eukaryotic shikimate pathway: gene fusions, horizontal gene transfer, and endosymbiotic replacements. Eukaryot Cell. 2006 5(9):1517-31.
  • Conlon EM, Alpargu G, Blanchard JL. Comparative genomics approaches for identifying genetic regulatory networks. Chance. 2006 19(3):45-48.
  • Hu J. On the reduction of biological complexity in Prochlorococcus. PhD Dissertation. Sept. 2008
  • Hu J, Blanchard JL. Environmental sequence data from the Sargasso Sea reveal that the characteristics of genome reduction in Prochlorococcus are not a harbinger for an escalation in genetic drift. Mol Biol Evol. 2009 Jan;26(1):5-13
  • Luo H, Sun Z, Arndt W, Shi J, Friedman R, Tang J. Gene order phylogeny and the evolution of methanogens. PLoS One. 2009 Jun 29;4(6):e6069.
  • Sun Z and Blanchard JL. Getting small in a big ocean: Massive gene loss is not responsible for the divergence of the high-light and low-light adapted ecotypes of Prochlorococcus. In preparation. 2010.


Progress 10/01/07 to 09/30/08

Outputs
OUTPUTS: As presentation of our recent work on "The role of mutation and selection in Prochlorococcus genome reduction" (Sun Z, Blanchard J) was given at the Evolution 2008 (Society for the Study of Evolution), Minneapolis, MN. The computational method "Filtering of Environmental Metagenomic Sequences (FEMS)" developed in our Molecular Biology and Evolution publication has been uploaded on SourceForge (http://sourceforge.net/projects/fems/). FEMS is a model-based sequence-filtering scheme designed to characterize sequences belonging to specific phylogenetic groups and to filter sequences, which may cause problems in particular analyses. PARTICIPANTS: Jinghua Hu, PhD student Zhiyi Sun, PhD student Supratim Mukherjee, PhD student TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
To test whether the high-light adapted Prochlorococcus MED4 is experiencing a reduction in selection efficiency resulting from genetic drift, we examine two data sets, namely, the environmental genome shotgun sequencing data from the Sargasso Sea and a set of cyanobacterial genome sequences. After integrating these data sets, we compare the evolutionary profile of a high-light Prochlorococcus group to that of a group of Synechococcus (a closely related group of marine cyanobacteria) that does not exhibit a similar small-genome syndrome. The average pairwise dN/dS ratios in the high-light adapted Prochlorococcus group are significantly lower than those in the Synechococcus group, leading us to reject the hypothesis that the Prochlorococcus group is currently experiencing higher levels of genetic drift. A manuscript encompassing this work was accepted for publication in Molecular Biology and Evolution. Molecular Biology and Evolution is published on behalf of the Society for Molecular Biology and Evolution and is one of the top ranked evolutionary biology journals. One of the reviewers stated "The paper has made great strides to evaluate a challenging and critical component of molecular evolution (including using a novel filtering approach to make use of environmental metagenomic data)..."

Publications

  • Hu J, Blanchard JL. Environmental sequence data from the Sargasso Sea reveal that the characteristics of genome reduction in Prochlorococcus are not a harbinger for an escalation in genetic drift. Mol Biol Evol. 2008 Oct 8. [Epub ahead of print]
  • Hu J. On the reduction of biological complexity in Prochlorococcus. PhD Dissertation. Sept. 2008


Progress 10/01/06 to 09/30/07

Outputs
OUTPUTS: We presented the following posters (1) Sequence data from the Sargasso Sea reveal more genetic drift in marine Synechococcus than in Prochlorococus. Hu, Blanchard. 2006. New England Molecular Evolutionary Biologists Meeting. (2) A test of the ecogical niche hypothesis for genome reduction in Prochlocococus. Sun, Hu, Blanchard. 2006 New England Molecular Evolutionary Biologists Meeting. (3) Environmental sequence data from the Sargasso Sea reveals that the characteristics of genome reduction are not a harbinger for an accelerated accumulation of deleterious mutations in Prochlorococcus. Blanchard and Hu. 2007 Gordon Research Conference on Microbial Population Biology. Software programs developed from this project have been used in analzying Clostridium and Escherichia genome sequences. PARTICIPANTS: Jeffrey Blanchard (PI), Jinghua Hu (PhD student, Ben Saunders (PhD student), Zhiyi Sun (PhD student, Supratim Mukherjee (PhD student)

Impacts
To test whether the high light Prochlorococcus MED4 is experiencing a reduction in selection efficiency from genetic drift, we developed a new method for filtering an environmental genome shotgun sequencing data of the Sargasso Sea rich in Prochlorococcus high light ecotypes and the used recently completed whole genome sequences to compare the high light Prochlorococcus group to a closely related marine cyanobacteria Synechococcus group that does not show the genomic symptoms ascribed to genetic drift in endosymbionts. The distinct profiles of the evolutionary dynamics in the two groups reject the hypothesis that there is currently a reduction in selection efficiency in the Prochlorococcus group.

Publications

  • No publications reported this period


Progress 10/01/05 to 09/30/06

Outputs
Based on analysis of the genome sequence data, we have found genetic signatures commonly found in endosymbionts, pathogens and eukaryotic organelles and indicative of a reduction in selection efficiency. These results challenge the assumption that nearly all genomic and regulatory network variation found in this marine cyanobacterial group reflects adaptation to local ecological niches. A marked decrease in selection efficiency is extremely unusual in a free-living nonpathogenic bacterium and has important implications for modeling the evolutionary trajectory of this organism in response to a rapidly changing global climate. This work was presented as a poster at the International Conference on Systems Biology. As described below, we are currently assessing the Saragasso Sea data within the framework of our hypothesis that a reduction in selection efficiency in Prochlorococcus has led to reduced genome complexity before publishing our results. A PhD student in my lab, Jinghua Hu, has begun to implement approaches for EGSS classification using the published assembled contigs and the trace sequence data from the Sargasso Sea. The EGSS data set includes both phage and bacterial sequences. Our analyses have uncovered an uneven distribution of genetic variation among species that appears to be the result of insertion of phage genes derived from Prochlorococcus and Synechococcus genomes. This observation has been further supported by recent publications of Prochlorococcus and Synechococcus phage genomes carrying psbA, cobS and pst genes that are among our over-represented genes. Another surprise in the EGSS contig data set is that the variation within EGSS sequences is extremely high. Over 90% of the sequences differ by at least 30% at synonymous sites and the contig data is comprised of sequences from just four of the seven samples. This may reflect an extreme amount of microheterogenity within Prochlorococcus populations or the existence of many distinct ecotypes or simply be an artifact of the sequence assembly process collapsing variation. We have developed a novel approach to identify co-regulated genes using multiple gene-specific regulatory sequence motifs (Conlon et al. 2006). Compared to existing approaches, our method of grouping genes based on multiple motifs improves predictions of co-regulation. Our long-term goal is not to simply generate a list of regulatory elements to be verified experimentally, but to generate regulatory predictions to test our hypothesis that a decrease in gene content has led to a decrease in regulatory complexity. Thus, we (Hu, Conlon and Blanchard) are applying the method to the eight sequenced Prochlorococcus and Synechococcus genomes by removing one or more genome(s) at a time from the analysis and generating an inferred regulatory network. By comparing the networks we can test whether the networks are shrinking and whether they are changing through a distributed or modular process.

Impacts
The intellectual merit of our research lies in the development of innovative laboratory and computation approaches to test whether the reduction in genome complexity is driven by a genome-wide decrease in the efficiency of selection in response to global climate change. Global climate change is an international problem that already is impacting the evolutionary trajectory of our planet's biota. In spite of the widely appreciated magnitude of this problem, we still have a limited ability to estimate current and long-term biological effects. Our research has potential broad impacts on the decision-making processes affecting global climate change because it addresses the impact of environmental change on a widespread group of marine photosynthetic microorganisms that may have already been dramatically impacted by climate change occurring in this past century.

Publications

  • Richards TA, Dacks JB, Campbell SA, Blanchard JL, Foster PG, McLeod R, Roberts CW. 2006. Evolutionary origins of the eukaryotic shikimate pathway: gene fusions, horizontal gene transfer, and endosymbiotic replacements. Eukaryot Cell. 5(9):1517-31.
  • Conlon EM, Alpargu G, Blanchard JL. 2006. Comparative genomics approaches for identifying genetic regulatory networks. Chance. 19(3):45-48.


Progress 10/01/04 to 09/30/05

Outputs
Our preliminary analyses have continued to generate new hypotheses from and raise new challenges in the analyses of genomic and environmental shotgun sequence data. The stunning similarities in Prochlorococcus genome characteristics to endosymbionts and obligate pathogens suggest a paradoxical reduction in selection efficiency, although much more work is needed to rule out selection for these characteristics and to determine whether the reduction in selection efficiency results in the accumulation of slightly deleterious mutations. The causes of variation in genome size were analyzed using software we developed called the "Genome Flux Analyzer" (Tolopko and Blanchard, in preparation) to place gene loss, gene duplication, horizontal gene transfer events, and the appearance of "unique" genes (genes which do not have a significant match in the database) in a phylogenetic context. Our estimates suggest that the genome of MED4 has 560 fewer genes than the last common ancestor of the Prochlorococcus/ Synechococcus clade with a total of 947 genes being lost and 387 new genes arising. Thus, reduction in gene content seems to be a dominant and ongoing process for Prochlorococcus, and may result in progressively smaller genomes, particularly for high-light adapted Prochlorococcus MED4 and MIT9312. Overall, the differences in genome size, coding capacity and corresponding physiological and ecological differences are primarily the result of gene loss and do not reflect adaptive gain of function differences. Thus, it may be that the genetic changes occurring, such as the loss of genes involved in photosynthesis and carbon dioxide fixation, are the result of the current episode of climate change and may be more pronounced in Prochlorococcus MED4 because of a general reduction in the efficiency of selection and/or a general increase in the mutation rate. Analysis of levels of genetic variation between protein coding regions in whole genome sequence data revealed that the differences in amino acid distances between the high-light Procholorococcus and Synechococcus affect nearly all genes, suggesting that the increase in distance may be the result of a decrease in purifying selection, increase in mutation rate, or both. We have begun to implement proof-of-concept approaches for EGSS classification using the published assembled contigs and the trace sequence data from the Sargasso Sea. Surprisingly, there exists not only a nearly complete composite Prochlorococcus genome as was reported, but marine Synechococcus-like sequences that map to 94% of the Synechococcus WH8102 genome. Another surprise in the EGSS contig data set is that the variation within EGSS sequences is extremely high. Over 90% of the sequences differ by at least 30% at synonymous sites and the contig data is comprised of sequences from just four of the seven samples. This may reflect an extreme amount of microheterogenity within Prochlorococcus populations, the existence of many distinct ecotypes or simply be an artifact of the sequence assembly process collapsing variation.

Impacts
Atmospheric CO2 levels are rapidly rising coincident with upward trends in fossil fuel consumption. How will photosynthetic organisms, including crop species, respond? The size and complexity of this issue has lead to controversial international resolutions and action by national funding agencies including the USDA, NSF and DOE. The answer is critical to determining what impact our changing environment will have on our agricultural practices and whether the ecological response of photosynthetic organisms will mitigate or intensify global climate change. This projects's merit is in determining the past and future evolutionary trajectories of a marine photosynethic bacteria in response to varying climatic factors. Marine cyanobacteria, primarily members of the Prochlorococcus/Synechococcus group, are the most numerically abundant photosynthetic organisms on Earth, contribute up to 80% of the oceanic primary production and thus have become important objects of study for modeling global climate change. This specific project will advance our ability to predict the influence of climatic effects on molecular processess, including carbon sequestration in Prochlorococcus populations.

Publications

  • Blanchard, J.L. 2004. Bioinformatics and Systems Biology, rapidly evolving tools for interpreting plant response to global climate change. Field Crops Research. 90:117-131.


Progress 10/01/03 to 09/30/04

Outputs
New Project

Impacts
No Impact yet.

Publications

  • No publications reported this period