Source: GEORGETOWN UNIVERSITY submitted to
GENOME INFORMATICS FOR AGRICULTURALLY IMPORTANT HYMENOPTERA SPECIES AND THEIR PATHOGENS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0213025
Grant No.
2008-35302-18804
Project No.
DCR-2007-04622
Proposal No.
2007-04622
Multistate No.
(N/A)
Program Code
51.2C
Project Start Date
Jan 15, 2008
Project End Date
Jan 14, 2010
Grant Year
2008
Project Director
Elsik, C. G.
Recipient Organization
GEORGETOWN UNIVERSITY
37th & O STREETS NW
WASHINGTON,DC 20057
Performing Department
(N/A)
Non Technical Summary
Hymenopteran insects play vital roles in agricultural crop production. These roles include pollinators, such as bees, and beneficial control insects, such as parasitic wasps. The recent widespread appearance of honey bee Colony Collapse Disorder (CCD) has reminded us of the magnitude of the impact of honey bee pollination on crop production. An understanding of factors that affect honey bee pollination requires an understanding of bee biology, including implications of honey beea?Ts complex social organization. The genome sequences of the hymenopteran insects, honey bee and jewel wasp, will have a tremendous impact on our understanding of genetic mechanisms underlying biological features important to agriculture. For example, an understanding of genes involved in the honey bee immune system will lead to better management practices, preventing losses due to phenomena such as colony collapse disorder. The genome sequences provide information for all the genes in an organism, but this information must be made accessible to insect biologists. The purpose of this project is to further develop a database and web based tools that will allow researchers to access and integrate the hymenoptera genome sequences and other genomics data. The outcome of the project will be genome informatics resources that will allow researchers to share, access and analyze genomics data. These resources will immediately enhance the value of ongoing honey bee and wasp genomics projects in the understanding of insect biology, enabling discoveries that will lead to improved management practices, such as strategies to maintain pollinator populations and to make the best use of biological control organisms.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2113010108060%
2113110108040%
Goals / Objectives
The objective of this project is to develop a comprehensive model organism database for hymenopteran insects that will allow researchers to leverage genetic, genome sequence, and gene expression data, as well as the biological knowledgebase of other model organisms. The rationale that underlies the proposed research is that we cannot realize the full potential of the genome sequences and other genomic data generated using USDA and other funding sources without integrating the data and linking it to organism biology in a resource that will be accessible and easily utilized by biologists and agricultural researchers. We will pursue the following specific goals: 1) Extend an existing informatics resource (BeeBase) by incorporating new tools, including new web accessible search and graphical interfaces and 2) Maintain and enhance sequence annotations, with an essential level of support that exceeds what is possible at the genome sequencing center, NCBI and other resources that are not hymenoptera specific. The outcome of this project, the Hymenoptera Genome Database, will significantly enhance the value of the honey bee (Apis mellifera) and jewel wasp (Nasonia vitripennis) genome sequencing projects. The honey bee genome sequence will have a tremendous impact on our knowledge of honey bee biology for three reasons. First, the sequence will facilitate comparison of honey bee with other model organisms, allowing the honey bee model to leverage the extensive biological knowledgebase for models such as Drosophila. Second, genome sequencing allows the emergence of new experimental systems for organisms that do not have powerful systems for breeding and genetics. Third, the genome sequence will provide a reference that will allow the integration of other genomic resources. For example, quantitative trait locus (QTL) data can be combined with gene expression data allowing for functional inference more powerful than either alone. The genome sequence of Nasonia vitripennis provides a new hymenoptera genome resource for powerful comparative inferences into honey bee gene function and structure. Furthermore, Nasonia itself has a valuable role in agricultural research as a model for beneficial parasitic wasps, which are used to control insect pests. The combination of these two genomes into a single web accessible resource is not only a cost effective approach to allow biologists and agricultural researchers to leverage the genome information, but will synergistically enhance the value of genomic data for each species by facilitating cross-species comparisons. Integrating QTL, SNP, haplotype block, expression and sequence data will significantly speed up the identification genes that contribute to agriculturally important traits. The deliverable is a web-accessible genome resource that will enable researchers to effectively exploit hymenoptera genome sequences. Genomics data will be collected from the research community and presented in a way that will allow the entire community to benefit from genomics projects carried out by the individual labs.
Project Methods
We will accomplish our first objective using open-source software, including components of the Generic Model Organism Database (GMOD) project. The components to be integrated include the GMOD genome database schema (Chado), genome viewers (Gbrowse), single nucleotide polymorphism (SNP) and haplotype block viewers, a graphical interface to examine comparative synteny (SynBrowse), gene expression annotation query and cross-species comparison functions, annotation tools (Apollo), community annotation submission websites, and gene pages with annotations, database cross-references, and links to other databases. Some of the software has already been implemented as part of BeeBase using other sources of funding (Chado, Gbrowse, gene pages). This project focuses on implementing new tools for honey bee and wasp, implementing existing tools for wasp, incorporating new data that will result from the Honey Bee Phase II sequencing project, and performing computations required for cross-species comparisons. We will accomplish our second objective by maintaining a single reference gene set, maintaining mappings to new assembly releases, curating data from the community, and developing new controlled vocabularies to computationally describe honey bee and Nasonia genomics data. Our strategy is to carry the work forward where the sequencing projects end. We will host a community annotation submission site, to serve as the community data portal for the long-term. In addition, we will provide annotations for sequences that do not have annotations available elsewhere. This is often the case for expressed sequence tag (EST) sequences that do not overlap coding regions, yet are often used as probes in genome-wide gene expression studies. In addition to annotations, we will curate data for expression, genetic maps, QTL and SNP. We will perform computations to support community genomics projects, such as annotating microarray elements with Gene Ontology. We will also construct chromosome superscaffold assemblies using community input. We will incorporate additional data of interest to honey bee and Nasonia researchers, such as sequences for additional hymenoptera species and pathogens.

Progress 01/15/08 to 01/14/10

Outputs
OUTPUTS: We have expanded BeeBase to the Hymenoptera Genome Database by incorporating the genome of a parasitoid wasp, Nasonia vitripennis. We have designed web pages for the Hymenoptera Genome Database using the Drupal content management platform. This includes the Hymenoptera Genome Database home page (http://HymenopteraGenome.org), and all pages of BeeBase (http://BeeBase .org) and NasoniaBase (http://NasoniaBase.org). In addition, wiki pages have been added so that users may submit discussions and comments. We have developed genome databases for Nasonia vitripennis and Apis mellifera using the GMOD Chado schema. We have developed a Nasonia genome browser using GBrowse, with tracks for the official consensus gene set, other gene prediction sets, cDNAs, ESTs, homologs, repetitive elements, pseudogenes, SNPs and markers. We have created a consensus gene set for Nasonia using GLEAN. We selected the best set after running GLEAN seven times using different combinations of the following gene predictions and alignments: NCBI RefSeq, NCBI Gnomon (excluding RefSeq), Fgenesh, Fgenesh++, Geneid, Augustus, aligned Swissprot homologs and ESTs. A "gold standard" set was created using 82 manually annotated coding sequences contributed by Hugh Robertson to evaluate the seven GLEAN sets. We also evaluated each GLEAN set for agreement with EST splice sites. We have predicted Nasonia microRNAs based on homology to known microRNAs, conservation with Apis mellifera, and alignment with reads from 454 sequencing Nasonia small RNA libraries. We have implemented community annotation tools using the Apollo Annotation Editor for Nasonia and Apis mellifera. The Apollo client software, installed on the researcher's desktop, was configured to connect remotely to the Chado PostgreSQL database. We have continued to support the honey bee genome assembly by manually editing gene models mapped to assembly Amel_4.0, and by creating additional genome browsers to show superscaffolds (manually improved scaffold assemblies) for honey bee chromosomes 7 through 11 using data submitted by Hugh Robertson, similar to the superscaffolds we had completed previously for chromosomes 12 through 16. We have added a Bee Pest and Pathogens information home page to BeeBase, with information about each pest and pathogen genome project, including Ascosphaera apis, Nosema ceranae, Paenibacillus larvae and Varroa destructor. We created a genome browser for N. ceranae using published data. Tracks include gene predictions, E. cuniculi BLASTX homolog alignments, ribosomal BLASTN alignments, and tRNAs. We have updated the P. larvae genome browser with six new data tracks, three of which come from NCBI supported data: protein-coding gene predictions, non-coding RNA gene predictions and miscellaneous features, with links to NCBI. The remaining three tracks are ab initio gene predictions, which include two tracks produced using GeneMarkS (with and without RBS parameters in the model) and one track produced using MetaGeneAnnotator. We created BLAST databases and websites for A. apis, N. ceranae and P. larvae. PARTICIPANTS: The project took place at Georgetown University under the direction of PD Christine Elsik. Personnel on this project included postdocs, Darren Hagen and Monica Munoz-Torres, and programmers Justin Reese and Jay Sundaram. We collaborated extensively with members of the Nasonia Genome working Group, in particular Jack Werren and postdoc Chris Desjardins (University of Rochester), Juergen Gadau (Arizona State University) and Stephen Richards (Baylor College of Medicine Human Genome Sequencing Center). We hosted Chris Desjardins at Georgetown University for a week to train in bioinformatics programming to develop annotation tracks and to annotate using Apollo. TARGET AUDIENCES: We developed bioinformatics resources designed to target the hymenoptera research community. We held an annotation training workshop using Nasonia data at the Arthropod Genomics Symposium in Kansas City in April 2008. We distributed annotation tutorials via a listserv, and communicated with research community members about annotation via email and conference calls. PROJECT MODIFICATIONS: We revised our objectives to meet the needs of in the research community. We replaced the goals to develop gene pages, synteny browser and expression annotation (objective 1), and to develop GO terms (objective 2) with increased community annotation support, development of a consensus gene set, microRNA and conserved element prediction, and small RNA library analysis.

Impacts
We have combined genomic data for two hymenoptera species, Apis mellifera and Nasonia vitripennis, into a single resource to make it easily accessible to hymenoptera researchers. We have provided resources that enabled the Nasonia Genome Working Group to analyze the Nasonia genome to gain biological insights, leading to publication in a high impact journal. We have trained many members of the hymenoptera research community in annotation, providing many researchers with their first experience in bioinformatics. We have made datasets produced by the Nasonia Genome Working Group publicly available on NasoniaBase download pages.

Publications

  • Werren, J.H., Richards, S., Desjardins, C.A., Niehuis, O., Gadau, J., Colbourne, J.K. Beukeboom, L.W., Desplan, C., Elsik, C.G., et al. 2010. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 327: 343-348.
  • Munoz-Torres, M.C., Reese, J.T., Childers, C.P., Bennett, A.K., Sundaram, J.P., Childs, K.L., Anzola, J.M., Milshina, N.V. and Elsik, C.G. 2011. Hymenoptera Genome Database: integrated community resources for insect species of the order Hymenoptera. Nucleic Acids Research 39:D658-D662.
  • Reese, J.T., Childers, C.P., Hagen, D.E., and Elsik, C.G. The Hymenoptera Genome Database. Poster Abstract, Plant and Animal Genome Conference, San Diego, CA, January 10-14, 2009.
  • Reese, J.T., Childers, C.P., Hagen, D.E, Munoz-Torres, M.C. and Elsik, C.G. The Hymenoptera Genome Database. Poster Abstract, Biology of Genomes Meeting, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, May 5-9, 2009.
  • Hagen, D.E., Evans, J.D., Werren, J.H and Elsik, C.G. Transcriptional evidence of microRNAs in the jewel wasp Nasonia vitripennis. Poster Abstract, Arthropod Genomics Symposium, Kansas City, MO, June 11-14, 2009.
  • Munoz-Torres, M.C. Reese, J.T., Childers, C.P., Hagen, D.E. and Elsik, C.G. The Hymenoptera Genome Database. Poster Abstract, Arthropod Genomics Symposium, Kansas City, MO, June 11-14, 2009.
  • Munoz-Torres, M.C., Reese, J.T., Childers, C.P., Bennett, A.K., Sundaram, J.P., Vile, D.C. and Elsik, C.G. The Hymenoptera Genome Database. Poster Abstract. Plant and Animal Genome Conference, San Diego, CA. Jan. 9-13, 2010.