Source: Massachusetts Institute of Technology submitted to
THE GENOME SEQUENCE OF PHYTOPHTHORA INFESTANS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
EXTENDED
Funding Source
Reporting Frequency
Annual
Accession No.
0205753
Grant No.
2006-35600-16623
Project No.
MASR-2005-05219
Proposal No.
2005-05219
Multistate No.
(N/A)
Program Code
23.2
Project Start Date
Dec 15, 2005
Project End Date
Dec 14, 2008
Grant Year
2006
Project Director
Nusbaum, H. C.
Recipient Organization
Massachusetts Institute of Technology
(N/A)
Cambridge,MA 02139
Performing Department
(N/A)
Non Technical Summary
Oomycetes, a group of eukaryotes that are evolutionarily distant from fungi, plants and animals, are the most devastating pathogens of dicot plants. They cause enormous economic damage on important crops and environmental damage in natural ecosystems. The most destructive and best-studied oomycete is Phytophthora infestans, the cause of late blight of potato and notorious as the agent of the Irish Potato Famine. Today P. infestans causes losses in potato production worldwide of over 5$ billion each year, making it the single greatest pathogen threat to global food security. Phytophthora species are known to be pathogens of virtually all dicot crop plants, and have proven very difficult to manage. An annotated, high quality genome sequence of P. infestans will have broad impact on agriculture and plant pathology by greatly facilitating and accelerating the pace of research on this important agricultural pest. The overall goal of this project is to produce a high quality genome sequence of P. infestans at a reasonable cost, and to make this sequence an effective resource by providing high quality annotation. We will continue to coordinate our work closely with the Phytophthora community and release all data promptly to public databases. As we have observed for multiple organisms, the availability of high quality annotated genome sequence drives research in the field.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
21240991040100%
Goals / Objectives
The overall goal of this project is to produce a high quality genome sequence of P. infestans at a reasonable cost, and to make this sequence an effective resource by providing high quality annotation. To achieve this we have set the following objectives: 1. Generate a whole genome shotgun assembly representing 7X coverage 2. Build a dense genetic map and integrate with the genome assembly 3. Identify 2000 full length cDNAs 4. Annotate the genome with respect to genes and other features 5. Release genome data and associated information including EST alignments and results of homology searches in a browsable format 6. Develop education, training and outreach programs
Project Methods
We will apply the following approaches the objectives of the project. 1. Generate a whole genome shotgun assembly representing 7X coverage. We will produce a high quality draft assembly of the P. infestans genome. Assembly will be performed with our Arachne assembler. The sequencing and assembly processes will be optimized to yield the highest quality assembly in the most cost-efficient manner. 2. Build a dense genetic map and integrate with the genome assembly. We will place a total of 3000 SNPs on the whole genome assembly or an average of one every 79kb. These SNPs are easily discovered computationally from the genome assembly. Genotyping of SNPs will be performed at the state-of-the-art BI genotyping facility. 3. Identify 2000 full length cDNAs. We will identify 2000 full length cDNAs (FLcDNAs) representing ~10% of the P. infestans gene set to provide a critical, highly reliable training dataset for gene annotations as well as an invaluable resource to the research community to support ongoing functional analyses. We will construct 5 cDNA libraries using Clontech CapFinder PCR cDNA construction kit (BD Biosciences), and identify FLcDNAs that represent a wide range of biological functions from diverse developmental stages. 4. Annotate the genome with respect to genes and other features. Gene-finding algorithms will be trained for P. infestans using gene, EST and FLcDNA data. Experience with the P. sojae and P. ramorum genomes shows that even after training gene callers, making accurate gene calls is difficult. Accordingly, once this is complete, experts in gene annotation P. infestans biology will team up to manually curate a statistical sampling of gene calls. Results will be used to validate gene calls, correct potential errors in gene prediction and fine-tune the gene callers. 5. Release genome data and associated information including EST alignments and results of homology searches in a browsable format. Data from the P. infestans sequencing project will be made freely available. Sequence traces will be sent to the National Center for Biotechnology Information (NCBI) trace repository; the genome assembly and assemblies of all large insert clones will be deposited at GenBank; ESTs will also be deposited into PFGD (www.pfgd.org) and integrated with existing P. infestans EST data. An additional data release will provide access to the automated annotation of the P. infestans genome sequence both in flat files and in a rich, browsable format. 6. Develop education, training and outreach programs. We will organize a series of teacher education workshops on "forensic pathogen DNA from the Irish potato famine" in partnership with high schools and museums. We will also develop a museum exhibit illustrating the life cycle of the pathogen, its present and past impact on humankind, and information on modern genomics research.

Progress 12/15/05 to 12/15/06

Outputs
Annual report for NSF/USDA Genome Sequence of Phytophthora infestans. Proposal number 2005-05219. Progress report: During the first year of this project we have made significant progress toward achieving the stated goals of the Phytophthora infestans genome project. Genome sequencing. We generated 3,141,755 whole genome shotgun reads containing 1,894,706,650 Q20 bases, representing ~9x coverage of the genome. Sequence data were generated from three clone types: high copy number plasmids with 4kb inserts, low copy number plasmids with 10kb inserts, and single copy Fosmids with 40kb inserts. All the reads have been submitted to the NCBI trace repository. Genome assembly. Two shotgun assemblies of the whole genome sequence data were performed. A preliminary assembly (version 0.5) was done when roughly half the data were available, and publicly released on July 19, 2006. When all whole genome sequencing was completed, an assembly of the full data set was done (version 1.0), and released on October 23, 2006. The version 1.0 release is available on the Broad Institute web site at http://www.broad.mit.edu/annotation/genome/phytophthora_infestans/Hom e.html The assembled genome is available in GenBank under the accession AATU0100000. This version 1.0 assembly includes 190 Mb of the estimated 242 Mb of the full genome. The remainder is believed to be mostly high copy repeat and tandem repeat sequences, as almost all of the unassembled sequences align at high identity to portions of the assembled genome. The N50, or weighted median, contig size is 44.4 kilobases (kb) and the N50 supercontig, or scaffold, is 1.57 megabases (Mb). The N50 represents the size of assembled sequence block for which 50% of the assembled bases are in a unit of that size or larger. Alignment of the available ESTs to the v1.0 assembly suggests it contains ~95% of the unique protein coding sequence in the genome. All alignments of the ESTs are available on our web site. Annotation. Full automated gene annotations by our standard process are currently in being computed and will be released as they become available. Work is ongong in several other areas in support of the annotation process. We have end sequenced 500 putative full length cDNAs from an existing library. Two new cDNA libraries containing full length clones are being constructed. Existing EST and genome sequence data were used to generate 500 hand-curated gene models to serve as a training set for gene callers. Finally, we have developed a novel oomycete-specific comparative gene calling algorithm that is tuned to the existing oomycete genome datasets.

Impacts
Impact: Phytophthora infestans is the cause of late blight of potato and is notorious as the agent of the Irish Potato Famine. Worldwide, it causes over $5 billion in annual losses in potato production, making it the largest single pathogen threat to global food security. An annotated, high quality sequence of the P. infestans genome will have broad impact on agriculture and plant pathology by greatly accelerating the pace of research on this important agricultural pest, and lead to improved methods of detection and control.

Publications

  • No publications reported this period