Genome Sequencing and Analysis of Erwinia Chrysanthemi 3937

GENOME SEQUENCING AND ANALYSIS OF ERWINIA CHRYSANTHEMI 3937

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

OTHER GRANTS

Reporting Frequency

Annual

Accession No.

0190107

Grant No.

2001-52100-11316

Cumulative Award Amt.

(N/A)

Proposal No.

2001-04679

Multistate No.

(N/A)

Project Start Date

Sep 15, 2001

Project End Date

Mar 31, 2004

Grant Year

2001

Program Code

[(N/A)]- (N/A)

Recipient Organization
UNIV OF WISCONSIN
21 N PARK ST STE 6401
MADISON,WI 53715-1218

Performing Department
ANIMAL HEALTH & BIOMEDICAL SCIENCES

Non Technical Summary
Erwinia chyrsanthemi causes soft rot, the "common cold" of bacterial plant diseases. Erwinia soft rots chronically afflict many crops, and their economic toll is particularly high because disease often develops after the crop has accrued the cost of harvesting and distribution. The purpose of this project is to sequence the complete genome of E. chrysanthemi strain 3937. Genomic data for Erwinia will yield insight into how this pathogen causes disease, and possibly new strategies for controlling important plant diseases. We are striving for high quality finished genome sequence from this experimentally tractable model system to seed downstream functional genomics initiatives beyond the scope of this study. The bulk of data collection from random shotgun clone libraries will be conducted by the Joint Genome Institute (JGI) whose high-throughput capacity greatly reduces the cost of sequencing. Gap closure and sequence finishing will be done at the Genome Center of Wisconsin, University of Wisconsin. Annotation and analysis will be conducted through a multi-layered effort involving researchers with expertise in enterobacterial genomics, international Erwinia biologists, and students at the University of Wisconsin. We will compare the annotated Erwinia genome to other enterobacterial sequences to identify genes unique to the phytopathogen and genes common to related plant and animal pathogens.

Animal Health Component

(N/A)

Research Effort Categories

Basic

100%

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
212	4010	1040	100%

Knowledge Area
212 - Pathogens and Nematodes Affecting Plants;

Subject Of Investigation
4010 - Bacteria;

Field Of Science
1040 - Molecular biology;

Goals / Objectives
Sequence the complete genome of Erwinia chrysanthemi 3937. Annotate the genome sequence. Analyze the content and structure of the genome. Identify genes that are unique to the phytopathogen and genes common to related plant and animal enterobacterial pathogens.

Project Methods
Whole genome random shotgun libraries will be prepared and the Department of Energy Joint Genome Institute (JGI) will collect approximately 8-fold coverage of the genome by sequencing both ends of random small-insert library clones. A BAC library will be constructed and the Genome Center of Wisconsin will sequence both ends of individual random BAC clones to create a scaffold for assembly of the random sequence data from JGI. The BAC clones will also serve as templates for PCR and primer-walking to achieve final genome sequence gap closure. The genome will be annotated in a multi-phasic effort, beginning with automated processing of sequence similarity searches against public databases. This preliminary annotation will be publicly released through the JGI immediately following random shotgun data collection. The next stage of annotation will involve a distributed community effort. A web-based annotation system will be used to allow an evolving group of approximately 30 investigators, the International Erwinia Consortium, to undertake a joint annotation that will reflect their combined expertise. The annotation system will also be used in a classroom environment to provide an opportunity for graduate students and upper-level undergraduates to participate directly in genome annotation and analysis. The Erwinia chrysanthemi genome will be compared to other complete genomes of pathogenic and non-pathogenic enterobacteria to define the phylogenetic distribution of genes and to identify the genetic basis of commonalities and differences in basic biological processes among this phenotypically diverse group of closely related organisms.

Progress 09/15/01 to 03/31/04

Outputs
We sequenced and annotated the complete genome of Erwinia chrysanthemi 3937, a strain widely used by the community as a model system for research on the molecular biology and pathogenicity of soft-rot Erwinia species. The annotation was conducted in a distributed fashion by an international consortium of Erwinia researchers. We have drafted a manuscript describing this 4.9 Mb genome focusing on gene products, particularly those with known or putative associations with virulence and host-microbe interactions. Among these are proteins involved with carbohydrate degradation, regulation, metal homeostasis, chemotaxis, oxidative stress, and a remarkable collection of secretion systems. The complete community annotation is publicly available in a database driven internet resource called ASAP, that includes authorship information and evidence used to support each line of annotation and descriptions of gene products using two different controlled vocabularies. ASAP also provides an environment for linking to high-throughput experimental data and community contributed updates of the annotation beyond this initial project. Comparisons with published genomes of select enterobacteria revealed that approximately 2300 genes are likely derived from the ancestor of this group, and surprisingly, only 500 additional genes beyond this core are shared with Erwinia carotovora atroseptica (Pectobacterium atrosepticum).

Impacts
The genome sequence of E. chyrsanthemi has already influenced a broad spectrum of research conducted in the individual laboratories of our international consortium. These efforts include studies of the function of single genes, characterizations of the genetic basis of phenotypes, bioinformatics and experimental studies of key virulence associated regulons, and gene expression profiling of the pathogen in a host environment. The sequence has both enabled existing research programs and motivated new lines of investigation. The open collaborative nature of our annotation project provided researchers early access to the genome data and this translated to a jumpstart on downstream experimentation. This model also encouraged participation of a diverse collection of scientists at all stages of their careers and has helped forge new collaborations between subsets of the consortium.

Publications

S. Yang, N.T. Perna, D.A. Cooksey, Y. Okinaka, S.E. Lindow, A.M. Ibekwe, N.T. Keen and C.-H. Yang. 2004. Genome-wide identification of plant up-regulated genes of Erwinia chrysanthemi 3937 using GFP based IVET Leaf Arrays. MPMI, 17(9):999-1008.

Progress 01/01/03 to 12/31/03

Outputs
The sequence of the Erwinia chrysanthemi strain 3937 genome is now complete. The ASAP system continues to serve as a vehicle for direct contribution of annotations and genome scale experimental data. In total, the community annotation team contributed 24032 individual annotation records, including 12037 classifications of gene function using the MultiFun controlled vocabulary. Examination of the gene content has led to numerous insights into the biology of this organism. Highlights from the genome contents include two type IV secretion systems, an additional type II secretion system, new cell wall degrading enzymes, an astounding collection of 48 methyl-accepting chemotaxis proteins and putative insecticidal toxins. Comparative analyses of the E. chrysanthemi genome with four other enterobacteria (2 Escherichia coli strains, Yersinia pestis and E. carotovora supspecies atroseptica) have been used to assemble lists of 1810 ortholgous genes shared by all of these organisms, as well as catalogs of meaningful subsets, such as 2835 genes ortholgous in the two Erwinia species. Comparative analyses of the E. chrysanthemi proteins to those of all other sequenced microbes in the NCBI ref_seq database were used to identify all genes more similar to those of other plant pathogens (approximately 500) than the more closely related animal associated enterobacteria. All comparative data and annotations will be released to the public using the ASAP system. The project team has begun writing a manuscript detailing our findings using a new technology called 'wiki' that allows multiple authors to simultaneously edit a series of html pages.

Impacts
The impact of the genome sequence on E. chrysanthemi research is already becoming obvious. Many of the researchers who contributed to the genome annotation have already begun follow up experiments on interesting observations from the genome project. A meeting of many of the participants will be held this summer in Madison, Wisconsin to coordinate high-throughput cooperative post-sequencing activities. This meeting will be jointly supported by the NSF and CNRS. Link to the Erwinia chrysanthemi 3937 Genome Project Home Page is http://www.ahabs.wisc.edu/~pernalab/erwinia/.

Publications

No publications reported this period

Progress 01/01/02 to 12/31/02

Outputs
Sequencing Progress - TIGR had collected just over 59,000 sequences from the two random shotgun clone libraries as of March, 2002. This number of sequences is sufficient for 8-fold coverage of a 3.7 Mb genome, the size previously estimated for E. chrysanthemi 3937 using PFGE. Surprisingly, the number of contigs in the original assembly of this data, and the number of contigs composed of single sequences, suggested a much larger ( 5 Mb) genome. In light of this, we revised our sequencing plan and TIGR collected additional data from the random libraries, bringing the total number of high-quality sequencing reads to 66,788 as of July, 2002. Assembly of the total random shotgun phase data produced 119 contigs, with 97 sequence gaps, spanned by clones already chosen from the library, and 22 physical gaps. Sequence finishing efforts as of January 2003 have closed all but 6 of the original sequencing gaps using primer-walking techniques with the primary clones. PCR products have been generated spanning 18 of the 22 physical gaps, and await primer-walking early in 2003. The final genome is expected to be approximately 5 Mb, a size similar to most other sequenced enterobacterial genomes. Annotation and Analysis Progress - The Perna group developed a community annotation system called ASAP to facilitate a distributed annotation effort. Forty-two individuals from over twenty research groups around the world and ten University of Wisconsin students have actively participated in the annotation and analysis of the E. chrysanthemi genome using the ASAP system. Focusing on the largest contigs from the initial TIGR assemblies, these individuals have collectively annotated over 4830 open reading frames, including 2006 completely unknown proteins and 1163 additional hypothetical proteins conserved between ECH3937 and at least one other organism. Functions have been inferred for the proteins encoded by the remainder of the open reading frames and the proteins have been assigned to one or more MultiFun classification categories. Ching-Hong Yang and Nicole Perna are reviewing all contributed annotations for consistency and quality. We also conducted preliminary comparisons of the ECH3937 genome with published sequences from two animal associated enterobacteria, Escherichia coli and Yersinia pestis, and a draft sequence of E. carotovora supspecies atroseptica, obtained from the Sanger Centre web site. The E. carotovora sequence data were produced by the Pathogen Sequencing Unit at the Sanger Institute in collaboration with the Scottish Crop Research Institute and can be obtained from http://www.sanger.ac.uk/Projects/E_carotovora/. In brief, these comparisons reveal that approximately half of the ECH3937 genes have homologs in all three other Enterobacteriaceae, although collinearity is restricted to blocks of a relatively small number of genes. The two Erwinia species appear to share a larger number of genes in common, but final analyses must await completion of both genome sequences.

Impacts
The complete genome sequence of Erwinia chrysanthemi 3937 will impact a variety of different ongoing research projects and will give rise to genome-scale experiments aimed at dissecting how this pathogen survives and causes disease in host plants. The availability of the genome sequence will enhance the utility of E. chrysanthemi as a model system to study plant-microbe interactions and bacterial pathogeneis in general. The ASAP community annotation system developed for this genome has allowed researchers and students world-wide to train in genome analysis while contributing directly to the ongoing project.

Publications

J.D. Glasner, P. Liss, G. Plunkett III, A. Darling, T. Prasad, M. Rusch, A. Byrnes, M. Gilson, B. Biehl, F.R. Blattner, N.T. Perna. ASAP, a systematic annotation package for community analysis of genomes. Nucleic Acids Research 2003, Jan 1:31(1):147-151.
Y. Okinaka, C.H. Yang, N.T. Perna and N.T. Keen. Microarray profiling of Erwinia chrysanthemi 3937 genes that are regulated during plant infection. Mol Plant Microbe Interact. 2002 Jul; 15(7): 619-29

Progress 09/15/01 to 12/31/01

Outputs
The Erwinia genome sequencing project has only been underway for two and half months. During this time we have made substantial progress. High molecular weight DNA was prepared for E. chrysanthmi strain 3937 by the Keen Laboratory and transfered to The Institute for Genomic Research (TIGR). Steve Gill and Derrick Fouts from TIGR are overseeing the whole genome random shotgun sequencing phase at TIGR. Thus far, two clone libraries have been contructed and passed quality control reviews. Sequence data is actively being collected from clones of both libraries. We anticipate completion of the random shotgun sequencing phase by the end of March 2002. The Perna laboratory has established a web site for the Erwinia genome project, which can be reached at: http://www.ahabs.wisc.edu:16080/pernalab/erwinia_main.htm. Software is under development in the Perna laboratory to facilitate community annotation. An international consortium of Erwinia experts will use this software to contribute to the project.

Impacts
This genome sequence will provide an international network of researchers a resource to accelerate their investigations of how this bacteria causes plant disease and hopefully develop new strategies to control plant disease. The Erwinia genome will also provide an opportunity to compare closely related plant and animal pathogens to identify commonalities in the way they cause disease and escape host defenses.

Publications

No publications reported this period