Source: UNIVERSITY OF DELAWARE submitted to NRP
KNOWLEDGE EXTRACTION AND ANNOTATION IN A GRID BASED BIOINFORMATICS ENVIRONMENT
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0212704
Grant No.
2008-35205-18734
Cumulative Award Amt.
(N/A)
Proposal No.
2007-04195
Multistate No.
(N/A)
Project Start Date
Jan 1, 2008
Project End Date
Dec 31, 2011
Grant Year
2008
Program Code
[43.0]- (N/A)
Recipient Organization
UNIVERSITY OF DELAWARE
(N/A)
NEWARK,DE 19717
Performing Department
ANIMAL & FOOD SCIENCE
Non Technical Summary
A major impediment to using information derived from genomic science is the difficulty associated with getting information from resources distributed around the country and world. This grant will fund work to address this problem by providing an Internet based query system that accesses databases housed at different sites around the US. A user will be able to retrieve information from databases housed in Delaware, Mississippi, Arizona and New York from a single interface. The results will be provided in a user friendly format that integrates the information, regardless of source. A further outcome of this work will be a web accessible reference for biological pathways along with new tools for knowledge extraction from the literature. All of the developed tools will be accessible via the Internet, and software will be freely available. This effort will provide tools that are useful for both academic and applied researchers that use genomic science to address concerns of the agricultural community.
Animal Health Component
(N/A)
Research Effort Categories
Basic
50%
Applied
(N/A)
Developmental
50%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30432201080100%
Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3220 - Meat-type chicken, live animal;

Field Of Science
1080 - Genetics;
Goals / Objectives
1. Develop a grid based system for retriving information from distributed database resources. 2. Develop a metabolic and signal transduction pathway database for the chicken using the Reactome model 3. Develop tools for knowledge extraction from the literature to support genomic annotation.
Project Methods
Objective 1 will be met using an existing tool Biomart to integrate data across multiple, remote platforms. We will extend the functionality of Biomart by introducing computational inference using ontological information. Objective 2 will be met using approaches pioneered by the Reactome group for human pathways. We will manually curate pathways predicted in the chicken based on the genomic sequence and published experimental data. Objective 3 will also make use of an existing tool Textpresso to extract information from abstracts and publications. We will extend Textpresso abilities to include name, species and ontology term recognition to facilitate the effort of annotators

Progress 01/01/08 to 12/31/11

Outputs
OUTPUTS: The major outputs of this project include software tools for text mining the scientific literature for information about genes, incorporation of these tools into the AgBase curation and annotation pipeline, development of appropriate data models for incorporating livestock data into the Reactome pathway database, an analysis tool, Pathfigure, for evaluating transcriptome data in the context of the Reactome database, the establishment of a web based transcriptome database housing RNAseq expression data. Arizona: Work during this past year has focused on enhancing the capabilities of Birdbase and improving integration with avian genomics resources including GEISHA, AgBase, AGNC, and most recently the multi genome browser CoGe. The Birdbase database was redesigned to provide a more robust platform for integrating the anticipated flood of genomic data for other bird species. This significant effort is now completed. Data for turkey and zebra finch are being incorporated into BirdBase. Data from 50 bird genomes recently sequenced at by the Beijing Genomics Institute have been uploaded to CoGe. Integration of the various resources on the Birdbase user interface is ongoing. Work at NYU was focused on using the Reactome data model and web tools to annotate the functions of key Gallus proteins by representing them as participants in sequences of reactions. Protein annotations are linked to the canonical representations of these proteins in UniProt or Ensembl; small molecules are linked to their reference forms in the ChEBI database. Pathfigure: Given differential expression data from an experiment, Pathfigure predicts what Gallus Reactome pathways are differentially active. Based on this pathway prediction, Pathfigure automatically generates graphs that show the relationships of these differentially expressed proteins on four levels, including the pathway level, expressed gene level, reaction level, and all-participants level. PD-Explorer: PD-Explorer (Plan Domain-Explorer) integrates general domain-independent plan adaptation strategies and domain specific formalized background knowledge (e.g. ontologies) to propose and evaluate hypothetical plans, based on an incomplete planning domain model. BioPlanner is an instance of the PD-Exporer tool for biological domains, that views a biological pathway (particularly a signal transduction pathway) as a plan, and information from sources such as Gallus Reactome as a currently incomplete planning domain model. . eGRAB for disambiguating gene names in text and retrieving all the scientific papers mentioning a particular gene. eGIFT for mining gene-related information from text. PARTICIPANTS: Carl J. Schmidt Principal Investigator University of Delaware K Vijay-Shanker Co-PI University of Delaware Fiona McCarthy co-PI Mississippi State University Peter D'Eustachio co-PI New York University Parker Antin co-PI University of Arizona Keith Decker co-PI University of Delaware Li Jin Doctoral Student University of Delaware Catalina Oana Tudor Doctoral Student University of Delaware Anjana Saxena Post-doctoral Fellow New York University Veronica Shamovsky Post-doctoral Fellow New York University Lisa Matthews Post-doctoral Fellow New York University TARGET AUDIENCES: Life scientists PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
We have provided multiple web interfaces for serving scientists interested in genomics and text mining including: Bridbase: http://birdbase.arizona.edu/birdbase/ Pathfigure: http://birdbase.udel.edu/pathfigure/ eGIFT: http://biotm.cis.udel.edu/eGIFT/ Gallus Reactome: http://gallus.reactome.org/ Birdbase Transcriptome: http://birdbase.udel.edu/birdbase_atlas/ We are currently using the platform we developed with support from this award to host a comparative genomics site devoted to 50 different bird species that have been sequenced by BGI. Integration of Software Tools: eGIFT has been integrated with AgBase, developed by collaborators at Mississippi State eGRAB has been integrated with various other tools at University of Delaware, such as Rlims-P (for mining phosphorylation), and eFIP (for mining the impact of phosphorylation on the subsequent interactions involving the phospho-protein). This award directly supported completion of two Ph.D. Degrees at the University of Delaware

Publications

  • Catalina O Tudor, Carl J Schmidt, K Vijay-Shanker. eGIFT: Mining Gene Information from the Literature, BMC Bioinformatics, 2010, 11:418
  • Catalina O Tudor, Carl J Schmidt, K Vijay-Shanker. Mining for Gene-Related Key Terms. Third International Symposium on Semantic Mining in Biomedicine SMBM, Turku Finland, 2008,157-160
  • Catalina O Tudor, K Vijay-Shanker, Carl J Schmidt. Mining the Biomedical Literature for Genic Information, BioNLP Workshop in conjunction with ACL, 2008, 288-290.
  • Li Jin, Keith S. Decker, Carl J. Schmidt: BioPlanner: A Plan Adaptation Approach for the Discovery of Biological Pathways across Species. IAAI 2009
  • L. Jin and K. Decker. Ontology Oriented Exploration of an HTN Planning Domain through Hypotheses and Diagnostic Execution. Proceedings of the Workshop on Knowledge Engineering for Planning and Scheduling, at ICAPS 2010.
  • L. Jin. Stability Oriented Task-Structure Based Multi-Agent Re-Planning [extended abstract]. 2009.
  • Li Jin, Decker, K.S.,Stachnik, A.J., Schmidt, C.J. Prediction of biological pathways with integrated information. Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009.
  • Dalloul et al. 2010 Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8(9): e1000475. doi:10.1371/journal.pbio.1000475
  • Burt DW, Carre W, Fell M, Law AS, Antin PB, Maglott DR, Weber JA, Schmidt CJ, Burgess SC, McCarthy FM. 2009 The Chicken Gene Nomenclature Committee report. BMC Genomics [10 Suppl 2:S5]
  • Li Jin, PhD: 2011 Exploring Incomplete Planning Domain Knowledge through Hypothesis Generation and Diagnostic Execution.
  • Catalina O Tudor. 2011 Using Text Mining Techniques to Gather Gene-Specific Information from the Biomedical Literature, PhD Thesis, 2011