Progress 02/01/08 to 01/31/11
Outputs OUTPUTS: I am pleased to report that all of the technical and scientific objectives of this proposal are complete, and we are in the process of a small community analysis of the completed genome to generate publications based on connections between the biology and annotated genome and genes of M. destructor. Sequence generation and genome assembly. We generated 9 454 titanium runs generating 3,694,907,229 bp of fragment sequence, or 23X coverage of the Hessian Fly genome with an average read length of 323.2 bp. To help with assembly we also generated 2,595,291 were successful 3kb pe paired end reads giving a total "clone" coverage of 6.48 Gb, or 41X clone coverage of the Hessian fly genome. Additionally we performed 6, 20kb paired end titanium runs, or 338X clone coverage of the Hessian fly genome. We assembled this sequence to generate the Mdes 1.0 genome assembly. The assembly comprises 153Mb of sequence, is available from NCBI, the BCM-HGSC website and from Agripest base at KSU. Alignment of EST sequences to the genome found that it contains the vast majority of Hessian fly genes. The contig N50 length is 14kb, and the scaffold N50 length is 756kb. 60% of the genome assembly was placed on M.destructor chromosomes using a physical map provided by Jeff Stuart. The quality of the assembly was assessed by alignment of RNAseq data, >95% of RNAseq transcripts could be aligned to the genome assembly ensuring its completeness. Transcript sequencing and genome annotation. We generated 4 Illumina lanes of RNA seq data (95 bp read length, paired end data, ~250bp insert size) from 4 diiferent life stages: pooled female eggs (57M reads), female first instar larvae (64M reads), male first instar larvae (60M reads) and female third instar larvae(50M reads). This data plus protein sequences form other species, and ab-inito gene predictions was used to Run Maker 2.0 and generate evidence based gene models for M. destructor. We annotated 13,284 protein coding genes with an average length of 394 amino acids. Data dissemination. The genome assembly is available via NCBI genbank, the BCM-HGSC webpage, and Agripestbase at KSU. Additionally blast resources are available at all three places, and a GMOD based browser for looking at the annotate assembly is also available at Agripestbase. Agripestbase is also running the community annotation of the Hessian fly with the GMOD Apollo annotation tool PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Target Audience: The release of the intermediate Hessian fly assembly has already been used extensively by Molecular Hessian fly laboratories to accelerate their research into the hessian fly. The numbers of investigators is expanding as the annotation consortium scales up with interest. Our current target audience is entomologists, molecular insect scientists, plant pathologists, but by publication we hope to bring wheat breeders, and growers into the knowledge circle to use molecular information about gall formation in the quest to generate wheat strains with long term resistance to M.destructor. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Whilst we have completed our specific aims of annotating all of the Hessian fly genes, our broader goal is to greatly accelerate Hessian fly (M.destructor) research. The Hessian fly is a wheat pest that lays eggs on young wheat plants. The larvae that hatch trick the plant into growing a gall around them, and feeding the growing insect - at the detriment of the plant giving stunted growth and poor crop yields. We now have a list of all 13,284 proteins that can possibly enable gall formation. The gene list has already accelerated research into the genes and processes causing gall formation. Small Secreted Salivary Gland Proteins (SSSGPs) have been identified by the Stuart Lab at the University of Purdue by genome mapping made much faster by the availability of these sequences. The Chen Lab at KSU has additionally found hundreds of these genes in the genome sequence based on sequence similarity. Expression levels fro the RNAseq data generated here show that these genes are expressed at the first instar larvae stage at the onset of gall formation. Additional analysis of the genome is focusing on sex determination, and the choice of host by comparing sequences from a nearby species that prefers barley to wheat. A small consortium is being formed around the Hessian fly sequence, to study all aspects of Hessian fly biology, and to prepare multiple publications.
Publications
- No publications reported this period
|
Progress 02/01/09 to 01/31/10
Outputs OUTPUTS: Progress report Delays are due to a decision to wait for the 454 titanium platform and assembly difficulties. We have completed all sequence generation goals. Here are the original objectives, interspersed with our current progress: OBJECTIVES 1. Generate raw sequence data representing 120-fold coverage of the Hessian fly: We generated 23X coverage of the Hessian Fly genome with an average read length of 323.2 bp, on the 454 platform, and an additional 12X coverage on the Illumina platform. 2. Generate 32X "clone" coverage paired-end data with 3kb and 10kb insert sizes: We generated 41X clone coverage of the Hessian fly genome in 3kb and 338X clone coverage in 20kb insert sizes. 3. Assemble 2Gb of raw 454 GS-FLX sequence reads and paired-end data into sequence scaffolds of ordered and oriented contigs, followed by placement on the existing physical map. Results: A 0.5 version assembly is available on the BCM-HGSC website (link below) with contig N50 of 9.8kb and Scaffold N50 of 271kb. Unfortunately the 9.8 kb N50 contig length is unlikely to encompassed the majority of genes in single contigs. An improved assembly is being prepared, its current statistics include a contig N50 of 14.1kb and a scaffold N50 of 1.06Mb. Whilst these statistics are significantly better, we require further improvements in the contig N50 before accepting a final assembly (the improved scaffold N50 is more than sufficient). Our aim is to increase the contig N50 beyond 20kb, to ensure most genes be contained within a single contig. 4. Generate ~1,200,000 EST sequences from a variety of Hessian fly tissues: One Illumina paired end lane with 110bp read length produced 7.15 million clones each with 220bp of raw sequence data. The vastly increased depth of sequencing on the illumina platform allows annotation of a higher percentage of Hessian fly genes. We are currently (March 2010) performing additional EST sequencing using RNA from mixed sex eggs, male and female mid stage larvae, and mixed sex late stage larvae. 5. Produce an automated annotation of the assembled Hessian fly genome sequence: We are waiting on the final assembly and additional EST data to start work on this aim. 6. Deposit data in public databases: We are in the process of placing all raw sequence data into the short read trace archive. The initial version of the assembly is already available to the public via the HGSC website at: http://www.hgsc.bcm.tmc.edu/project-species-i-Hessian_fly.hgscpageLo cation=Hessian_fly . As additional assemblies and annotations become available, they will also be made available to the public as described in the original grant. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Impact: The release of the intermediate Hessian fly assembly has already been used extensively by Dr. Stuarts laboratory and other laboratories to accelerate their research into the hessian fly. In particular Dr Stuarts work to clone susceptibility and resistance genes has been greatly accelerated. We expect this impact will become large as the final assembly an annotation is released and advertised more broadly.
Publications
- No publications reported this period
|
Progress 02/01/08 to 01/31/09
Outputs OUTPUTS: Our year one goals for the Hessian fly genome project were to: 1 generate 12X raw sequence coverage of the Hessian fly genome on the 454 FLX platform, 2 to generate 32X "clone coverage" of the genome in paired end data, and 3, to assemble these sequence reads in a genome sequence of ordered and oriented contigs. Year 2 goals aim to identify and annotate hessian fly genes. There has been one biological complication (that to be honest should have been foreseen) that has delayed the project. The hessian fly has an unusual sex determination system, wherein females wither give birth to all male or all female offspring. To allow the creation of a inbred line for sequencing and assembly Jeff Stuart identified a female that produced mostly male but some female offspring, and an inbred line was created, that is >95% male. Unfortunately the hessian fly has two X chromosomes accounting for approximately 40% of the total genome, and in males these will be at half coverage (or 6X) with our original sequencing strategy, and likely produce smaller contigs. To overcome this limitation, we used an updated 454 chemistry (XLR) which produces longer read lengths and more data per run to produce 20X sequence coverage of the Hessian fly genome - ensuring that X chromosomes will have at least 10X sequence coverage. The reagents for this upgrade only became widely available in September 2008, which has caused a 5-6 month delay. 11 XLR runs were performed, (10 of which were successful) generating a total of 3,582Mb of raw sequence or 22.6X coverage of the 158Mb genome, fulfilling objective one of the grant. Of these 5 454-XLR runs were of paired end sequence libraries and these produced 38.75 X paired end coverage of the Hessian fly genome where both ends could be mapped within the initial assemblies. Our third goal for the year is the assembly of the raw sequence into ordered and oriented Because of the delay in obtaining sequence, the assembly process was only started in Jan 09, and at this stage we can only report an intermediate assembly at this time. This initial assembly produced an assembly of 126Mb total size and a N50 contig size of 3.5kb. Initial assembly details: Number Of (> 500bp) Contigs = 53,939 Number Of Bases = 126,251,682 Avg Contig Size = 2,340 N50 Contig Size = 3,509 largest Contig Size = 62,836 Q40 Plus Bases = 118,284,880, 93.69% Q39 Minus Bases = 7,966,802, 6.31% Whilst this assembly is clearly not good enough, we will be releasing it as an initial assembly on the Human Genome Sequencing Center website, and making it web searchable via blast to aid researchers on the Hessian fly, and anyone who has an interest. A fuller improved assembly is in progress, and we expect to produce a higher quality product in the coming year (likely less than 6 months). Unfortunately this can only count as a partial fulfillment of goal three for the year, and we hope to catch up over year two of the grant. PARTICIPANTS: Stephen Richards (PD) directed the accumulation of sequence data for this project, and performed the initial assembly of the sequence data. Jeff Stuart (co PD) produced and provided pure isolated Hessian fly DNA from an inbred hessian fly line TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: We produced sequence using an upgrades version of the 454 pyro-sequencing platform (upgrade from FLX to XLR). This allowed more sequence to be generated at the same cost, but caused the delay of the project by 6 months. We increased the sequence coverage of the Hessian fly genome generated from 12X to 22X, to enable proper assembly of the X chromosomes (40% of the genome) from an inbred line mostly of male individuals.
Impacts The availability of the genome sequence of the Hessian fly is the first step towards the comprehensive identification, mapping and characterization of Hessian fly genes. We are currently making this genomic information available and searchable via a web based blast on our website to accelerate Hessian fly research into virulence and other genes affecting pest reduction of crop yields.
Publications
- No publications reported this period
|
|