Progress 10/01/12 to 07/01/13
Outputs Target Audience: Researchers and students in environmental health sciences. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? This project has provided (over its lifetime) significant opportunities for technical development by several software developers, and training of multiple scientists in data curation. It has also faclitated collaborations and expansion of research opportunities in undergraduate courses. How have the results been disseminated to communities of interest? Yes, this resource is publicly available. All information and analysis tools are accessible and documented at http://ctdbase.org. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Data Curation. CTD is the only publicly available database that provides curated data describing molecular mechanisms of action of chemicals and disease relationships. Currently, CTD provides over 820,000 curated interactions between more than 9,800 chemicals and 29,000 genes and proteins in over 450 species. CTD also presents more than 184,000 direct chemical-disease and 26,000 direct gene-disease relationships. Data integration in CTD also enables novel inferred relationships to be made. For example, an inferred chemical-disease relationship is established via curated chemical–gene interactions (e.g., chemical A is associated with disease B because chemical A has a curated interaction with gene C, and gene C has a direct relationship with disease B). Relationships are identified and help users develop hypotheses about mechanisms underlying environmental diseases. We also developed novel statistical analyses of these inferences (King et al. PLoS One. 2012;7(11):e46524.). Similarly, integration of curated chemical-gene-disease data with external data sets like the Gene Ontology, pathways and protein-protein interaction data provide insights into the functions and pathways affected by chemical exposures. Finally, curated information in CTD enables the research community to leverage broad-based legacy data, identify connections and patterns that might not otherwise be apparent, and use these insights coordinately with evolving technologies and emerging experimental approaches. To ensure that our data remain current and as complete as possible, we modified our literature triaging protocol via: a) journal-centric curation and b) curation updates of priority chemicals. We conducted a test phase by prioritizing curation of articles from “Toxicological Sciences” (2009, 2010, and 2011), “Chemico-Biological Interactions” (2009, 2010, and 2011), and “Environmental Health Perspectives” (2009, 2010, and 2011). We also updated curation of many priority chemicals (e.g., bisphenol A, arsenic). Details of these curation modifications were published recently (Database (Oxford). 2012 Dec 6;2012:bas051). Text mining innovation. The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. CTD staff has so-organized two workshops that bring together international groups to advance text-mining capabilities. Ontology development. We continue to enhance our controlled vocabularies for CTD curation and for the broader scientific community. At the Biocuration meeting in April 2012, we presented our disease vocabulary, MEDIC, which merges diseases from MeSH and OMIM into a hierarchical structure. We have since published a report on this vocabulary and it is available for download in several formats from the CTD cite for community use (Database 2012:bar065.). Analysis tool development. Many new analysis tools and data visualization strategies were implemented during this reporting period – several are highlighted below and described in more detail in Nucleic Acids Research 41: D1101-14.) Pathway prediction tool. We expanded the pathway prediction capacity of CTD significantly. Initially we connected users with pathways from the KEGG and Reactome databases based on genes associated with a chemical or disease of interest (see enricher tool below). We recently incorporated all of the curated protein-protein interaction and gene regulatory data from the BIND database and use these data in combination with Cytoscape visualization tool to allow generation of novel interaction pathways among genes in CTD (e.g., interacting genes for a chemical of interest, or genes that form the basis of an inferred chemical-disease relationship). This capability allows users to not only identify novel gene sets, but to determine whether there are known interactions among them. We also implemented a separate instance of this functionality that allows users to submit a gene set of interest. This capability is much like that provided by Ingenuity; however, in CTD it is freely accessible. Gene Set Enricher tool. This tool finds enriched GO or Pathway annotations (from KEGG and Reactome) associated with a gene set. A user can access the tool directly with their specific list of genes (http://ctdbase.org/tools/enricher.go), choose their enrichment analysis, and configure the results via any corrected (or raw) p-value threshold. The tool is also linked to all chemical-disease relationships in CTD such that users may see enriched GO and pathway annotations for each direct or inferred chemical-disease gene set. Data filtering capabilities. We calculate “comparable” chemicals based on similar interacting gene sets. Previously these calculations considered all types of interactions. To enhance the consistency of comparison, we added the ability to filter the calculations by interaction types and degree (e.g., increased transcription). Enhanced links to external resources. We now include links from CTD Chemical pages to ChEMBl, a dictionary of molecular entities focused on small chemical compounds, and to PubChem, a repository of chemical compounds and their associated biological activities. CTD Gene pages now have links to WikiGenes, an author-driven wiki system of biological information, and NCBI Gene provides links back to CTD Gene pages. In total, CTD links out to 23 external databases from our Chemical, Gene, Disease, Organism, Gene Ontology, Pathway, and Reference pages.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2012
Citation:
1. Davis, A.P., C.G. Murphy, R. Johnson, J.M. Lay, K. Lennon-Hopkins, C. Saraceni-Richards, D. Sciaky, B.L. King, M.C. Rosenstein, T.C. Wiegers, and C.J. Mattingly, The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res, 2013. 41(Database issue): p. D1104-14. PMCID: PMC3531134.
- Type:
Journal Articles
Status:
Accepted
Year Published:
2012
Citation:
2. Wiegers, T.C., A.P. Davis, and C.J. Mattingly, Collaborative biocuration--text-mining development task for document prioritization for curation. Database (Oxford), 2012. 2012: p. bas037. PMCID: PMC3504477.
- Type:
Journal Articles
Status:
Accepted
Year Published:
2012
Citation:
4. King, B.L., A.P. Davis, M.C. Rosenstein, T.C. Wiegers, and C.J. Mattingly, Ranking transitive chemical-disease inferences using local network topology in the comparative toxicogenomics database. PLoS One, 2012. 7(11): p. e46524. PMCID: PMC3492369.
- Type:
Journal Articles
Status:
Accepted
Year Published:
2012
Citation:
6. Davis, A.P., C.G. Murphy, R. Johnson, J.M. Lay, K. Lennon-Hopkins, C. Saraceni-Richards, D. Sciaky, B.L. King, M.C. Rosenstein, T.C. Wiegers, and C.J. Mattingly, The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res, 2012. PMCID: PMC3531134.
|