Source: CHILDREN'S HOSPITAL & RESEARCH CENTER, OAKLAND submitted to NRP
GENOME SEQUENCING AND EVOLUTION OF CHLAMYDOPHILA AND CHLAMYDIA SPECIES OF ANIMAL ORIGIN
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0219391
Grant No.
2009-65109-05760
Cumulative Award Amt.
$1,000,000.00
Proposal No.
2009-01473
Multistate No.
(N/A)
Project Start Date
Sep 1, 2009
Project End Date
Aug 31, 2014
Grant Year
2009
Program Code
[91311]- Microbial Genomics Sequencing
Recipient Organization
CHILDREN'S HOSPITAL & RESEARCH CENTER, OAKLAND
747 52ND STREET
OAKLAND,CA 94609
Performing Department
Center for Immunobiology and Vaccine Development
Non Technical Summary
Chlamydiaceae are bacterial pathogens that are responsible for a diversity of severe and debilitating infections in livestock throughout the United States and the world. For example, they cause abortions in sheep, goats, pigs and cattle, reduced fertility among cattle, reproductive failure in pigs, enteritis and diarrhea among pigs and cattle, and severe respiratory and intestinal diseases among poultry, all of which facilitate transmission in barnyard and breeding facilities. Consequently, these diseases carry a huge economic burden yet we know very little about how to identify or control these infections. In addition, there is a very limited understanding of why similar and variant strains of Chlamydiaceae infect different animal hosts and cause such disparate diseases. Furthermore, there are no effective vaccines to prevent Chlamydiaceae infections. Because the genome (the collection of genes that make up an organism) carries a wealth of information about the organism and few genomes of Chlamydiaceae have been sequenced, the objective of this proposal is to sequence the genomes of the breadth of Chlamydiaceae species and strains that infect animals as well as to develop a comprehensive world wide web database of these sequences and our results for public access. We will analyze the genomes to identify the range in their diversity to help identify and differentiate strains, determine the potential for the emergence of new strains, explain the evolution of the organisms, and explore how these organisms cause various diseases in different animal hosts. The research will also inform drug and vaccine development to reduce the economic impact from Chlamydiaceae and ensure healthy livestock. Our collection of ~300 Chlamydiaceae animal strains will provide the necessary samples for the research and constitutes a national resource that we will make publicly available. We will disseminate the data and our results and interface with scientific and public communities by releasing genome sequences to public databases, presenting our findings at national/international meetings, working with colleagues in the Chlamydia and other fields of science, publishing our results, hosting a world wide web-based WikiLIMS site for educational purposes (including tutorials), and providing opportunities for all students and underrepresented groups to become the next generation of experts in genomics.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
31140101040100%
Knowledge Area
311 - Animal Diseases;

Subject Of Investigation
4010 - Bacteria;

Field Of Science
1040 - Molecular biology;
Goals / Objectives
The objective of the proposal is to perform high-throughput genome sequencing of 32 Chlamydiaceae isolates from avian and mammalian livestock to advance our knowledge of 1) the range of Chlamydiaceae diversity to develop a more precise classification system for these organisms where strains and species correlate with disease phenotypes and/or host specificity and 2) genome structure, genetic reshuffling, and evolution of Chlamydiaceae. We will enhance the value of the genomes by providing extensive, consistent automated and manual annotation. Through comparative genomics, we will identify the diversity of strains and species from genome organization, gene composition, prophages and plasmids, and decipher how Chlamydiaceae evolve by determining evidence for positive selection and identifying rates and mechanisms of genetic reshuffling among all species/strains. The genomes will be available for future studies of metabolism and functional genomics as well as to develop interventions (e.g., drug targets, vaccines) to reduce the economic impact of Chlamydiaceae and ensure healthy livestock. The Milestones and Outputs are to: 1) develop a web-based WikiLIMS with software analysis tools, tutorials, and links incorporated for initial team use and then for open access (1st quarter (Q), year 1 for team use; online year 3). WikiLIMS will be installed on CHORI's Xserver and will be open to the worldwide web, including all tools/software and tutorials, making it a public educational site. Tutorials will be documents developed to ensure that users can easily use the tools/software that we will employ for data analyses in this proposal; 2) generate genomic DNA for all 32 species and strains (1st year into 1st Q, year 2). Clonal bacterial isolation and genomic DNA purification will be performed at CHORI for each isolate; 3) Complete genome sequencing and perform automated and manual annotation of the genomes along with all analyses (4th Q year 1 through years 2 & 3). All 454 sequencing, assembly and annotation will be performed by the Read team at Emory; manual curation will be performed by Dr. Read's team in consultation with Dean and Bruno teams. All genome analyses (gene identification, functional predictions and annotation, genome organization/structure, evidence of and statistical evaluation for genetic reshuffling, etc.) will be a collaborative effort of the Dean, Read and Bruno teams; 4) disseminate data (years 1-3). Ongoing data will be uploaded by the Read team onto the CHORI WikiLIMS, the NCBI microbial genomes database and GenBank as generated; updates will be provided to ensure ongoing public access; 5) publish all results (years 1-3). Publications will be a collaborative effort among Drs. Dean, Read, and Bruno teams. We have made provisions for publishing exciting results (e.g., novel virulence genes, transposons and plasmids) in year 1; and 6) attend appropriate national and international meetings (years 1-3). Drs. Dean, Read and Bruno will attend scientific meetings to promote the research and engage with the scientific community to improve annotation and analyses; Drs. Dean and Read will attend the annual Microbial Genomics Workshop.
Project Methods
Chlamydiaceae genomes will be shotgun sequenced using the 454/Roche GS-FLX Titanium instrument. Our experience sequencing Chlamydiaceae genomes (~1.1Mbp) with this technology suggests that the size and general absence of repeats makes de novo sequencing appropriate for numerous genomes. We expect to obtain 400 nucleotide(nt)+ reads to a redundancy of 30-80 fold coverage for each genome. Raw data will be assembled using 454 newbler software. Contigs will be aligned to the phylogenetically closest finished genome; primers will be designed to sequence across gaps. DIYA will be used for automated annotation followed by manual annotation. Data management and visualization will be via WikiLIMS software. Genome sequences will be deposited at NCBI in real time. We anticipate identifying: 1) unique sequences in housekeeping and other genes that will aid strain differentiation; 2) genes not previously encountered in Chlamydiaceae, some possibly closely related to virulence genes of other bacteria that imply cross-taxa horizontal transfer events; 3) novel plasmids and prophages; 4) genetic reshuffling between different strains and species; and 5) genetic structures associated with this reshuffling (e.g. flanking prophages, IS elements, etc.), all of which will inform evolutionary tactics of the organism that are currently ill defined. We propose four modes for analyses: 1) Inspection: Each genome will be carefully inspected visually using GBROWSE viewer or ARTEMIS (or ACT to compare to a close neighbor). This analysis often reveals patterns that more automated approaches overlook; 2) Protein and gene clustering: Genes and predicted proteins for all genomes will be extracted from the GFF database and all-versus-all blast searches. Proteins will be clustered into orthologous groups using MCL Markov clustering, which will take optimization of blast score parameters and MCL inflation value for best results. Gene families known to have undergone expansions and contractions (e.g., pmp genes) will be monitored closely under different clustering parameters and, if needed, manually split. Clustering results will be used to build phylogenies to look for recombination and identify disrupted genes, the core gene set found in every genome, and accessory genes in genomes of diverse phylogeny but similar environment; 3) Genome phylogeny reconstructions: Whole genome phylogenies will be inferred by creating, for each genome, concatenated molecules out of core genes or computing trees based on maximum likelihood or parsimony. Alternatively the topology of individual protein trees for each orthologous protein set in the alignment will be averaged; 4) Multiple Genome alignment: Based on genome phylogenies, we will investigate sub-branches at the genome level using MAUVE. MAUVE breaks down genomes into locally-collinear blocks that represent minimal units of genome identity and can be used to estimate the minimal rearrangement path using GRIMM. MAUVE locates regions unique to each genome or genome groups that may be inserted prophages or horizontally transferred islands. SimPlot, Plato and other tools will also be used to identify lateral gene transfer events.

Progress 09/01/09 to 08/31/14

Outputs
Target Audience:We describemethods for disseminating the genome and other data for chlamydial researchers and the broader scientific community: 1.We created a genome project page at NCBI for each strain to be sequenced, with information about place of isolation, disease association, etc. This creates an entry organized by species in NCBI "genomes in progress page", which is checked routinely by scientists seeking updates (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi) on genome sequencing for organisms of interest, serving to alert the scientific community to our project; 2. Dr. Read has been a member of the National Interagency Genome Sequencing Steering Committee and has senta list of these genomes to the committee to distribute to other government agencies at the first monthly Steering Committee meeting; 3. The PI and Dr. Read advertisedthe project at the Chlamydia Basic Research Society meetings in 2013 and 2015 and at the American Society for Microbiology, the Biology of Genomes, Cold Spring Harbor Labs, and the International Conference on Microbial Genomics; 4. CHORIhosted a seminar and workshop for all scientists interested in the project in 2013 in conjunction with the 2013 CBRS conference; 5. Participatedin the Microbial Genomics Workshop (NSF-USDA) to disseminate data, technology and tutorials that have been developed through this research proposal. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The analyses used state-of-the-art techniques for annotation, comparative genomics, and identifying genetic reshuffling, mechanisms of reshuffling and evolutionary strategies thatpromoted teaching and training by providing an interactive web accessible WikiLIMS for educational material and tutorials so that undergraduate (9) and graduate students (3) and postdocs (3) in the Dean, Read and Bruno teams couldtake a genome (or multiple genomes) and analyze it using the tutorial and analysis tools on the Wiki. We also conducted a workshop held in conjunction with CBRS to bring together researchers who are interested in Chlamydiaceae strains and bacterial evolution. Inviting researchers from these separate fields who have a common interestprovided a unique opportunities for synergy. Linking the symposium and workshop together also encouraged participation by researchers who are novices in bioinformatics, as well as experts in the field. There were 252 attendees at CBRS and many attended the workshop.There was freeexchange of information and a number of collaborations formedbetween researchers from diverse backgrounds and geographic regions. The Microbial Genomics Workshops held atPlant & Animal GenomeMeeting inSan Diego was also invaluable for advancing learning that was translated back to the Dean, Bruno and Read labs and on to their collaborations. Dr. Read is on the advisory board for the MS Bioinformatics program at Georgia Institute of Technology andused the data to train interns (3) in the program. In addition,CHORI is located in Oakland, a city that serves under-represented ethnic groups from all over the world. The PI had 4 minority students and postdocs in her lab who were exposed to the genomics project. Drs. Dean, Bruno and Read's institutions have a commitment to communicating scientific knowledge and providing opportunities for minorities. The PI is involved in the Minority Undergraduate Research Program (NIH grant: T35 HL07807, Training for Minority College Students) at CHORI where shementored 2 minorities per year in basic science and bioinformatics over the last decade. Two of these students worked on the genomics project. Dr. Read has been involved in the "Genomics Course for Educators" (communicating basic genomics research to high school biology teachers), acting as a scientific advisor to "Your World" biotechnology magazine for school-age children, and taking part in television programs on phylogeny for Oregon Public Broadcasting Corp. as part of a series designed to help train high school teachers. The genomics project was a greatopportunity to recruit and train minorities at CHORI, Emory and NMC. How have the results been disseminated to communities of interest?We describe here our methods for disseminating the genome and other data for our target audience of chlamydial researchers and the broader scientific community: 1.We created a genome project page at NCBI for each strain to be sequenced, with information about place of isolation, disease association, etc. This creates an entry organized by species in NCBI "genomes in progress page", which is checked routinely by scientists seeking updates (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi) on genome sequencing for organisms of interest, serving to alert the scientific community to our project; 2. Dr. Read has been a member of the National Interagency Genome Sequencing Steering Committee and has senta list of these genomes to the committee to distribute to other government agencies at the first monthly Steering Committee meeting; 3. The PI and Dr. Read advertisedthe project at the Chlamydia Basic Research Society meetings in 2013 and 2015 and at the American Society for Microbiology, the Biology of Genomes, Cold Spring Harbor Labs, and the International Conference on Microbial Genomics; 4. CHORIhosted a seminar and workshop for all scientists interested in the project in 2013 in conjunction with the 2013 CBRS conference; 5. Participatedin the Microbial Genomics Workshop (NSF-USDA) to disseminate data, technology and tutorials that have been developed through this research proposal for all years. 6. Published 6 peer reviewed manuscripts with 2 additional manuscripts in preparation. 7. Made our colleagues aware of the genomes and analyses available to them: Dr. Kaltenboeck (College of Veterinary Medicine, Auburn Univ., USA), Dr. Fukushi (Dept. of Veterinary Microbiology, Gifu Univ., Japan), Dr. Maurelli (Uniformed Services, USA), Dr. Sachse (Institute of Molecular Pathogenesis, Germany), Dr. Herrmann (Section of Bacteriology, Uppsala Univ., Sweden), Dr. Vanrompay (Dept. Molecular Biotechnology, Ghent University, Belgium), Dr. Greub (Microbiology Institute, Univ. of Lausanne, Switzerland), Dr. RS Gupta (Dept. of Biomedicine, McMaster University, Canada), Dr. Pospischil (Institute for Veterinary Pathology, University of Zurich, Switzerland) and Dr. Lehmkuhl (National Animal Disease Center, USDA). What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Advancing knowledge of Chlamydiaceae; 1. We developed a pipeline software programs to analyze the genomes of the 32 different strains and genomes of Chlamydiaceae. These included phylogeny including UPGMA, PHYLIP (neighbor and protpars), YNOO to calculated dN/dS, identification and removal of core genes with homologous recombination usingPairwise Homoplasy Index (PHI) / Neighbor Similiarity Score (NSS) / maximum χ2 method (PhiPack), analysis of effect of recomingination and mutation (r/m) on each genome and estimation of coalescent time using ClonalFrame, identificaiton of attribution to origins of recombination using BLAST, generation of co-ancestry matriz and population structure likelihood using fineSTRUCTURE, prediction of population structure liklihood using BAPS and evolutionary rates and diversgence times using BEAST. 2. Our major findings were: a. For C. trachomatis:Recombination had a significant effect on genetic diversification.We observed distance-dependent decay in linkage disequilibrium indicating that this obligateintracellular parasite behaved intermediately between sexual and clonal extremes.Fifty-five genes were identified as having a history of recombination; 92 were under positive selection based on statistical tests. Twenty-three genes showed evidence of being under both positive selection and recombination, which included genes with a known role in virulence and pathogencity (e.g., ompA, pmps, tarp). b. For C. psittaci:We analyzed 20 C. psittaci genomes from diverse strains representing the nine known serotypes of the organism as well as infections in a range of birds and mammals, including humans. Genome annotation revealed a core genome in all strains of 911 genes. Our analyses showed that C. psittaci has a history of frequently switching hosts and undergoing recombination more than for C. trachomatis. Evolutionary history reconstructions showed genomewide homologous recombination and evidence for whole plasmid exchange. Tracking the origins of recombinant segments revealed that some strains have imported DNA from as yet unsampled/unsequenced C. psittaci lineages or other Chlamydiaceae species. Three ancestral populations of C. psittaci were predicted, explaining the current population structure. Molecular clock analysis found that certain strains are part of a clonal epidemic expansion likely introduced into the US by South American bird traders, suggesting that psittacosis is a recently emerged disease originating in New World parrots. c. For C. abortus: We genome sequenced and analyzed isolates from avian, lower mammalian and human hosts. Based on core gene phylogeny, five isolates previously classified as Chlamydia abortus were identified as members of C. psittaci and C. pecorum. C. abortus is the most recently emerged species and a highly monomorphic group that lacks the conserved virulence-associated plasmid. Low-level recombination and evidence for adaptation to the placenta echo evolutionary processes seen in recently emerged, highly virulent niche-restricted pathogens such as Bacillus anthracis. In contrast, gene flow occurred within C. psittaci and other Chlamydiaceae species whereas the C. psittaci RTH strain, isolated from a red-tailed hawk (Buteo jamaicensis), is an outlying strain with admixture of C. abortus, C. psittaci and its own population markers. Additionally, an average nucleotide identity of <95% compared to other Chlamydiaceae species suggests that RTH belongs to a new species intermediary between C. psittaci and C. abortus. Hawks, as scavengers and predators, have extensive opportunities to acquire multiple Chlamydiaceae in their intestinal tract. This would facilitate transformation and homologous recombination with the potential for new species emergence. Our findings indicate that incubator hosts such as birds-of-prey likely promote Chlamydiaceae evolution resulting in novel pathogenic lineages. Stated goals: 1.We developed a web-based WikiLIMS with software analysis tools, tutorials, and links incorporated for our multi-site team use. WikiLIMS was installed on CHORI's Xserver and used extensively by teams at CHORI, Emory and NMC; 2.Genomic DNA was generated for all 32 species and strains from clonal bacterial isolation and genomic DNA purification. The purified DNA was sent to Dr. Read for genome sequencing; 3.Complete genome sequencing and automated and manual annotation of the genomes was performed by the Read team at Emory; All genome analyses (gene identification, functional predictions and annotation, genome organization/structure, evidence of and statistical evaluation for genetic reshuffling, etc.) were performed asa collaborative effort of the Dean, Read and Bruno teams; 4. Data has beendisseminated as described below. Gemomes have been deposited in the NCBI microbial genomes database and GenBank; 5. Our results to date have been published in 6 peer reveiwed manuscripts. All publications have been a collaborative effort among Drs. Dean, Read, and Bruno teams. Two final publications will come from this research that will include C. suis and comparative genomics ofall Chlamydiaceae. 6. We haveattended appropriate national and international meetings over the course of the grant. Drs. Dean, Read and Bruno attended the Chlamydia Basic Research Society meetings in 2011, 2013, and 2015, each year of theannual Microbial Genomics Workshop, the annualInfectious Disease Genomics & Global Health in Hinxton, UK,the American Society of Microbiology,Wind River Conference on Prokaryotic Biologyscientific, and biannaul meetings of theAnnual International Meeting on Microbial Genomicsto promote the research and engage with the scientific community to improve annotation and analyses; Drs. Dean and Read will attend the annual Microbial Genomics Workshop.

Publications

  • Type: Journal Articles Status: Published Year Published: 2011 Citation: Somboonna N, Wan R, Ojcius DM, Pettengill M, Chang A, Joseph S, Hsu RJ, Read TD, and Dean D. Hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L2) and D lineages. mBio 2011;2(3):doi:10.1128/mBio.00045-11. PMCID: PMC3088116
  • Type: Journal Articles Status: Published Year Published: 2011 Citation: Joseph SJ, Didelot X, Gandhi K, Dean D, Read TD. Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biology Direct 2011;6:28. PMCID: PMC3126793
  • Type: Journal Articles Status: Published Year Published: 2012 Citation: Joseph SJ, Didelot X, Rothschild J, de Vries HJ, Morr� SA, Read TD, Dean D. Population Genomics of Chlamydia trachomatis: Insights on Drift, Selection, Recombination, and Population Structure. Mol Biol Evol 2012;29:3933-46 PMCID: PMC3494276
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Read TD, Joseph SJ, Didelot X, Liang B, Patel L, Dean D. Comparative Analysis of Chlamydia psittaci Genomes Reveals the Recent Emergence of a Pathogenic Lineage with a Broad Host Range. mBio 2013;4(2). doi: 10.1128/mBio.00604-12. PMCID: PMC3622922
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Wolff BJ, Ganakammal SR,, Changayil S, Weil MR,, MacCannel D, Rowe L, Frace M, Pesti D, Ritchie BW, Dean D, Winchell JM. Chlamydia psittaci Comparative Genomics Reveals Intraspecies Variations in the Putative Outer Membrane and Type III Secretion System Genes. Microbiology 2015; In Press
  • Type: Journal Articles Status: Submitted Year Published: 2015 Citation: Joseph SJ, Marti H, Didelot X, Castillo-Ramirez S, Read TD, and Dean D. Evolutionary link between Chlamydia psittaci and Chlamydia abortus and emergence of abortion among lower mammalian species and humans. 2015: submitted


Progress 09/01/10 to 08/31/11

Outputs
OUTPUTS: Progress based on stated Milestones: 1. WikiLIMS set up: WikiLIMS was installed by the Read bioinformaticist on CHORI's Xserver and is currently open to the Bruno, Read and Dean groups for the purpose of data exchange, discussions of results and progress notations. It is fully functional and has been used by the teams as an efficient interface. The groups are in the process of developing tutorials to ensure that users can easily use the tools/software that we are already using for data analyses, that have been employed in various publications of ours and others, and that will be used in this research. The Wiki will go public the end of year 3. 2. Genomic DNA for sequencing: The Dean lab has clonally purified the 13 C. psittaci, 13 C. suis and 10 C. abortus strains (see Table 1, below). Each strain has been purified from contaminating human DNA and the purified gDNA has been sent to the Read lab for genome sequencing. The remaining isolates will be clonally purified this year and sent to the Read lab. 3. Attend scientific meetings: We have been attending meetings to promote the research and engage with the scientific community to improve annotation and analyses. i) Novel hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L2) and D lineages (talk). Infectious Disease Genomics & Global Health, Cold Spring Harbor/Wellcome Trust, Hinxton, UK September 12-15, 2010 ii) Discoveries in the genome of Chlamydiaceae. University of California at Berkeley Graduate Group in Bioengineering Fall 2010 Group Conference, Fallen Leaf, CA September 24-26, 2010 iii) Probing for Chi sites in Chlamydia. Microbial Genome Sequencing and Microbial Observatories Programs Awardee Workshop, January 15-16 and Plant & Animal Genome XIX Meeting, San Diego, CA January 15-19, 2011 iv) Recognizing recombination in C. trachomatis and the emergence of new virulent strains (talk). American Society for Microbiology, New Orleans, LA, May, 2011 v) Interplay of recombination and selection in the genomes of Chlamydia trachomatis. 55th Annual Wind River Conference on Prokaryotic Biology, Aspen Lodge, Estes Park, Colorado June 18-19, 2011 vi) Chlamydiaceae and the emergence of new virulent strains through lateral gene transfer (Invited Seminar). Institute of Molecular Pathogenesis, Jena, Germany June, 2011 PARTICIPANTS: The individuals who have worked on this grant include Dr. Deborah Dean (PI), Raymond Wan worked on Wiki and bioinformatics analyses), Michael Landis (worked on Wiki and bioinformatics analyses), Marianna Martinez (worked on molecular biology aspects of project and preparing gDNA for genome sequencing), and Nicole Fernandez (worked on molecular biology aspects of project) from Dr. Dean's group; Dr. Tim Read (co-PI), Alexander Cheng (worked on preparing genomes and genome sequencing) and Sandeep Joseph (prepared bioinformatics workflow with Drs. Dean and Read) in the Read Lab; and Dr. Bill Bruno (co-PI) and Mahak Kapoor (worked with Drs. Dean and Bruno on chi site analyses) in the Bruno lab. The Dean lab has provided training and professional development for a number of undergraduate and graduate students and minority individuals for their advancement in bioinformatics and genomics. A number of individuals currently working on the project are racial and ethnic minorities and some are new to the project (Martinez, Fernandez, Kapoor). TARGET AUDIENCES: The groups currently served by the project of scientists interested in next generation genomics and students interested in the same. Some students represent racial and ethnic minorities. The efforts include didactic training on molecular biology techniques and bioinformatics related to genomics. Consequently there is laboratory instruction, experiential learning opportunities, and seminars to disseminate the information. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
5. Research advances: Last year, to provide an organized approach to informatics analysis of Chlamydiaceae genomes, we developed a computational pipeline. We initially evaluated an unusual clinical isolate that appeared to belong to the lymphogranuloma venereum (LGV) group of strains we termed L2c. The isolate developed nonfusing, grape-like inclusions and a cytotoxic phenotype in culture, unlike LGV strains described to date. Deep genome sequencing revealed that L2c was a recombinant of L2 and D strains with conserved clustered regions of genetic exchange, including a 78-kb region and a partial, yet functional, toxin gene that was lost with prolonged culture. Indels (insertions/deletions) were discovered in an ftsK gene promoter and in the tarp and hctB genes, which encode key proteins involved in replication, inclusion formation, and histone H1-like protein activity, respectively. Analyses suggest that these indels affect gene and/or protein function, supporting the in vitro and disease phenotypes. Our data provided the first whole-genome evidence for recombination between a virulent, invasive LGV strain and a noninvasive common urogenital strain. Given the lack of a genetic system for producing stable Chlamydia mutants, identifying naturally occurring recombinants can clarify gene function and provide opportunities for discovering avenues for genomic manipulation.(SEE Somboonna N, et al. mBio 2011;2(3):doi:10.1128.) We further tested this workflow using 12 Chlamydia trachomatis and 1 Chlamydia muridarum (out group) genomes. We used comparative genomic analyses to assess the contribution of recombination and selection to understand evolutionary forces (SEE Joseph S,et al. Biol Direct. 2011; 6: 28.). Highlights were: Recombination had a significant effect on genetic diversification. We observed distance-dependent decay in linkage disequilibrium indicating that this obligateintracellular parasite behaved intermediately between sexual and clonal extremes. Fifty-five genes were identified as having a history of recombination; 92 were under positive selection based on statistical tests. Twenty-three genes showed evidence of being under both positive selection and recombination, which included genes with a known role in virulence and pathogencity (e.g., ompA, pmps, tarp). Chlamydophila psittaci (Cps) Genome Sequencing: gDNA preparations were shotgun sequenced to >20 fold average coverage using 454 GS-FLX sequencer. Predicted proteomes of 5 draft Cps genomes were combined with those of completed Chlamydia species and clustered using OrthoMCL pipeline. We identified 643 proteins conserved across Chlamydia and generated a list of proteins unique to each genome. Preliminary analysis suggests that gentic diversity is moderate to low in the 5 sequenced Cps. We are extending this work to analyze C. suis and C. abortus genomes and identify horizontally transferred genes. The final whole genome data will be generated by the end of 2011.

Publications

  • Somboonna N, Wan R, Ojcius DM, Pettengill M, Chang A, Joseph S, Hsu RJ, Read TD, and Dean D. Hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L2) and D lineages. mBio 2011;2(3):doi:10.1128.
  • Joseph SJ, Didelot X, Gandhi K, Dean D, and Read TD. Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biology Direct 2011;6:28.


Progress 09/01/09 to 08/31/10

Outputs
OUTPUTS: 1. WikiLIMS set up: WikiLIMS was installed on CHORI's Xserver and is currently open to Bruno, Read and Dean groups for data exchange, discussions of results and progress notations. We are developing tutorials for easily use of various tools/software that we are employing for data analyses. 2. Genomic DNA for sequencing: The Dean lab has clonally purified 13 C. psittaci strains. Each is being genome sequenced in the Read lab. The remaining isolates will be clonally purified this year. 3. Dissemination of results to communities: We have been attending meetings to promote the research and engage with the scientific community to improve annotation and analyses: i) Genetic reshuffling among Chlamydiaceae (talk). Microbial Genome Sequencing and Microbial Observatories Programs Awardee Workshop, January 9-10, and Plant & Animal Genome XVIII Meeting, San Diego, CA January 9-13, 2010; ii) Novel hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L2) and D lineages (talk). Infectious Disease Genomics & Global Health, Cold Spring Harbor/Wellcome Trust, Hinxton, UK September 12-15, 2010; iii) To be determined. University of California at Berkeley Graduate Group in Bioengineering Fall 2010 Group Conference, Fallen Leaf, CA September 24-26, 2010; iv) To be determined. Microbial Genome Sequencing and Microbial Observatories Programs Awardee Workshop, January 15-16 and Plant & Animal Genome XIX Meeting, San Diego, CA January 15-19, 2011. 4. Develop bioinformatics pipeline (work-flow): To provide an organized approach to informatics analysis of Chlamydiaceae genomes, we developed a computational pipeline (Figure 1). We tested it using 13 C. trachomatis strains as a model organism: a) Comparative Genome & Phylogenetic Analysis: Any number of protein/nt sequences can be used to perform all-verses-all blast analysis. Blast results are parsed using bioperl modules. The pipeline formats parsed data suitable for cluster analysis to identify orthologous gene groups using OrthoMCL (Markov Clustering). Results from OrthoMCL are parsed out and core genes are identified. Core genes are used for inferring both species tree and gene trees using phylogenetic software (e.g., PHYLIP). Species trees are reconstructed by concatenating multiple sequence alignments (MSA) of all core genes. The pipeline can generate phylogenetic trees using: Neighbor joining (NJ), Weighbor, Maximum Likelihood (ML) and Maximum Parsimony. b) Recombination Analysis - We tested software that detect recombination in bacterial genomes including phylogenetic incongruence between species tree and gene trees (approximately unbiased (AU) test and Shimodaira-Hasegawa (SH) test using CONSEL), pair-wise homoplasy index (PHI), MaxChi2, NSS and Clonalframe. c) Positive Selection Analysis - We tested the branch-site test of Yang and Nielsen in Codeml (PAML) to assess positive selection at particular sites and lineages. The likelihood of a model that does not allow positive selection is compared to one allowing positive selection on all lineages. The model allowing positive selection is tested using a likelihood ratio test (LRT) compared to Chi2 statistics. PARTICIPANTS: The individuals who have worked on this grant include Dr. Deborah Dean (PI), Raymond Wan (worked on Wiki and bioinformatics analyses), Michael Landis (worked on Wiki and bioinformatics analyses), Marianna Martinez (worked on molecular biology aspects of project and preparing gDNA for genome sequencing), and Nicole Fernandez (worked on molecular biology aspects of project) from Dr. Dean's group; Dr. Tim Read (co-PI), Alexander Cheng (worked on preparing genomes and genome sequencing) and Sandeep Joseph (prepared bioinformatics workflow with Drs. Dean and Read) in the Read Lab; and Dr. Bill Bruno (co-PI) and Mahak Kapoor (worked with Drs. Dean and Bruno on chi site analyses) in the Bruno lab. The Dean lab has provided training and professional development for a number of undergraduate and graduate students and minority individuals for their advancement in bioinformatics and genomics. A number of individuals currently working on the project are racial and ethnic minorities and some are new to the project (Martinez, Fernandez, Kapoor). TARGET AUDIENCES: The groups currently served by the project of scientists interested in next generation genomics and students interested in the same. Some students represent racial and ethnic minorities. The efforts include didactic training on molecular biology techniques and bioinformatics related to genomics. Consequently there is laboratory instruction, experiential learning opportunities, and seminars to disseminate the information. PROJECT MODIFICATIONS: There are no major changes.

Impacts
Initial Analysis of recombination sites - Chi Sites Chi sites (crossover hotspot instigator sites) are hotspots for bacterial recombination found in E. coli and most other bacteria studied. They are recognized by the RecBCD complex, altering its activity so that double stranded repair will often cause a recombination at the chi site. The E. coli chi site sequence is GCTGGTGG, and this sequence is over-represented in many, but not all bacteria. Bacteria that live in the same environment as E. coli commonly use a different chi sequence, which prevents excessive cross-species recombination. In C. trachomatis, the E. coli chi-site sequence is only slightly over-represented, occurring about one-tenth as frequently as in E. coli. Yet C. trachomatis, does have recombination hotspots, as we have shown previously (Gomes et al. Genome Research 2007;1:50-60). Identifying the signature of chi sites in Chlamydia, if they exist, would certainly be valuable for all of our research on recombination in Chlamydiaceae species. Computational methods for searching for sites related to recombination include looking for over-represented words, or words over-represented on one strand relative to another. We applied these strategies, using the R'MES program. Although some words are significantly over-represented, there was no single word that appeared to be frequent enough to play an important role in recombination (chi sites appear hundreds of times in the E. coli genome, while in a random model the word would be expected only a few times). Using a different approach, which we believe to be a novel application of previous work by others, we have found a new class of sites that is, computationally, suggestive of a chi site for C. trachomatis. We will test this in the genomes of Chlamydiaceae species that will be generated as part of this proposal. Exact matches to this 9-base word are found 49 times in the 1Mb genome of C. trachomatis, plus a significant number of 8-base versions with only the first or last base differing from the 9-base version. The resources from the grant and the development of the bioinformatics pipeline (work-flow) described above in Output were invaluable for making progress and to produce these initial progress outcomes.

Publications

  • No publications reported this period