Wheat leaf rust genome sequencing and comparative resources for rust fungi

WHEAT LEAF RUST GENOME SEQUENCING AND COMPARATIVE RESOURCES FOR RUST FUNGI

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

NRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

0215088

Grant No.

2008-35600-04693

Cumulative Award Amt.

(N/A)

Proposal No.

2008-04500

Multistate No.

(N/A)

Project Start Date

Sep 1, 2008

Project End Date

Aug 31, 2011

Grant Year

2008

Program Code

[51.0A]- Microbial Genomics (A): Genome Sequencing

Recipient Organization
Massachusetts Institute of Technology
(N/A)
Cambridge,MA 02139

Performing Department
(N/A)

Non Technical Summary
Puccinia triticina, the causative agent of wheat leaf rust (also known as brown rust of wheat), is one of the most serious diseases of wheat in North-America and throughout the world. Severe epidemics caused by leaf rust and stem rust, caused by the related species P. graminis f. sp. tritici, plague North-American wheat production. Wheat resistance to cereal rusts is precarious at all times, as new races evolve regularly and threaten sustainable crop production. By sequencing the genome of P. triticina and comparing it to that of P. graminis, we can define common features of rust fungi as well as those which make these two pathogens different. Rust fungi (Pucciniales) are obligate plant parasites with a wide range of plant hosts that include ferns, gymnosperms and angiosperms. The order Pucciniales is comprised of more than 7,000 species and is one of the most important groups of plant pathogens worldwide. This project will thus allow a wide community of researchers to use genomic approaches to elucidate the mechanisms of pathogenicity and the biotrophic life-style of rust fungi. The successful completion of this project will primarily impact the scientific community by providing information on the genome and expression of an important fungal plant pathogen. Several groups are prepared to use the genome sequence for work in proteomics, genetic mapping, and expression analysis. A comparative website for rust fungi will be developed at the Broad Institute to disseminate the information on the genome of P. triticina, as well as allow comparison to other genomes, including P. graminis. These resources will be used by a broad community of scientists in the public and private sector interested in basic fungal biology, host-parasite interactions, fungal evolution and the development of new methods for protecting agricultural crop plants from rust diseases. Furthermore, this project will provide educational opportunities for students to actively participate in fungal genomics and bio-informatics research and broaden the participation in basic science by underrepresented groups. In addition, this project will provide opportunities for postdoctoral education and training a new generation of scientists. A broader impact of this research will be the potential to develop novel new methods for control of rust diseases on cereal and other crops through a better understanding of the basic infection mechanism and disease process of obligate plant-parasitic fungi.

Animal Health Component

(N/A)

Research Effort Categories

Basic

100%

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
212	4020	1040	100%

Knowledge Area
212 - Pathogens and Nematodes Affecting Plants;

Subject Of Investigation
4020 - Fungi;

Field Of Science
1040 - Molecular biology;

Keywords

Goals / Objectives
The specific objectives of this proposal are to: (1) Sequence and assemble the complete genome of P. triticina isolate 1-1, Race 1 (BBBD), which is estimated to be between 100 - 120 Mb, using a hybrid of 454 and ABI (Sanger) Fosmid-end sequence; (2) Annotate gene structure using computational methods, 200,000 454 reads of ESTs from each of four new cDNA libraries, and other available ESTs; (3) Evaluate P. triticina polymorphism and diversity by comparing the sequenced strain with three additional isolates using Illumina/Solexa sequence; (4) Prompt public release to the National Center for Biotechnology Information (NCBI), GenBank, and the Broad website for all reads, assemblies, annotation, and discovered polymorphisms; and (5) Develop education, training and outreach programs.

Project Methods
The sequencing strategy for Puccinia triticina leverages the range of technologies that are currently available. We sought to define a sequencing model designed to utilize new and existing technologies to generate a high quality assembly at low cost. Our analyses of such data for Neurospora crassa indicated that a combination of several data types results in a high quality assembly at the fraction of the cost of a traditional Sanger assembly. In summary, 454 data will be used to generate the contigs and Sanger data from Fosmid clones will be used to link contigs into larger scaffolds. We aim to sequence the P. triticina genome with 15-fold 454 standard, 30-fold (physical) 454 jumping and 10-fold (physical) coverage in end-sequenced Fosmids (0.3-fold sequence coverage). The resulting assembly will cover the vast majority of the genome in long contiguous high quality scaffolds. This will allow us to cluster genes in neighborhoods and assure us of a good representation of the genome structure, gene order and will allow us to determine syntenic relationships with other rust fungi. We expect the consensus quality to be Q50 which means that we will have very few errors in gene models. To annotate the genome, we will generate Expressed Sequence Tags (ESTs) from four cDNA sources: heavily infected wheat leaf tissue, germinated urediniospores, infected meadow rue, and isolated haustoria from heavily infected leaves. Existing methods will be used to clone and sequence the cDNAs by 454, generating ~200,000 reads per source. We will use the ESTs to train the de novo gene-finding algorithms FGENESH and GENEID, as well as to call genes directly using an in-house method called FindORFs. To combine and evaluate the prediction data, the in-house automated gene caller will be used to select the best gene model for each locus. Targeted manual annotation will be carried out in loci where gene predictions conflict with EST evidence or BLAST evidence. We aim to use Illumina/Solexa data to call polymorphisms within three additional isolates representing other important races, each with varying levels of virulence, and all collected from the Great Plains. By comparing these reads to the reference assembly, we will call high confidence SNPs between the strains. We will use the SNPs to calculate polymorphism rates for genes, and identify rapidly evolving genes, examining this set for genes potentially involved in pathogenesis. Polymorphisms identified will be made available for browsing and download on the Broad website. All Illumina/Solexa reads will be deposited in the NCBI short read database when it becomes available.

Progress 09/01/08 to 08/31/11

Outputs
Target Audience: By releasing an updated genome for leaf rust, we have provided wheat rust researchers with the best available reference genome for this species to examine within and between species genetic differences. Changes/Problems: Highly polymorphic and repetitive genomes such as that of P. triticina pose a challenge assembling highly contiguous sequence. Due to the difficulties we encountered in assembling this genome with 454 reads, we generated more libraries than additionally planned, taking advantage of longer 454 reads and the incorporation of deep Illumina coverage. By additional evaluating the output of many different assembly algorithms we were able to select the best assembly for annotation and further analysis. What opportunities for training and professional development has the project provided? This project has provided an opportunity for all involved to learn new skills associated with handling the variety of new data types associated with this project. We have shared analysis methods to standardize such approaches, in particular with Illumina variant identification and RNA-Seq analysis. How have the results been disseminated to communities of interest? This assembly was submitted to NCBI under accession ADAS02000000 and released on the Broad Institute website. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Using a combination of 454, Illumina, and Sanger sequencing technologies, we generated sequence from multiple libraries to assemble the genome of P. triticina isolate 1-1, Race 1 (BBBD). The resulting 135 Mb assembly was annotated using a combination of EST data as well as deep transcriptome Illumina reads generated by RNA-Seq of spore and infection samples; a total of 14,878 genes were predicted using this data. This assembly was submitted to NCBI under accession ADAS02000000 and released on the Broad Institute website. Using this genome as a reference, we contributed to a pilot project to analyze Illumina data from additional races, to begin to map SNPs and expression variants that are associated with race shifts in the population.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Bruce M, Neugebauer KA, Joly DL, Migeon P, Cuomo CA, Wang S, Akhunov E, Bakkeren G, Kolmer JA, Fellers JP. 2014. Using transcription of six Puccinia triticina races to identify the effective secretome during infection of wheat. Frontiers in Plant Science. doi: 10.3389/fpls.2013.00520.
Type: Journal Articles Status: Published Year Published: 2013 Citation: Fellers JP, Soltani BM, Bruce M, Linning R, Cuomo CA, Szabo L, Bakkeren G. 2013. Conserved loci of leaf and stem rust share synteny interrupted by lineage-specific influx of repeat elements. BMC Genomics, Jan 29;14(1):60.