Source: MISSISSIPPI STATE UNIV submitted to
KNOWLEDGE REPRESENTATION RESOURCES FOR ANIMAL AGRICULTURAL RESEARCHERS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0224891
Grant No.
2011-67015-30332
Project No.
MIS-391110
Proposal No.
2010-04525
Multistate No.
(N/A)
Program Code
A1201
Project Start Date
Apr 1, 2011
Project End Date
Mar 31, 2014
Grant Year
2011
Project Director
McCarthy, F. M.
Recipient Organization
MISSISSIPPI STATE UNIV
(N/A)
MISSISSIPPI STATE,MS 39762
Performing Department
LSBI / IGBB
Non Technical Summary
For the first time in history, biologists have access to technologies that enable them to rapidly generate enormous amounts of data about the genomes of our agricultural species. However, researchers using these technologies now face a major bottleneck in deriving knowledge from data to use it for improving agricultural productivity. Our goal is to enable researchers to accelerate knowledge delivery from research investments by giving them the tools to avoid the current bottleneck. We will do this by linking existing information about how genes work to biological data; developing novel and improved methods for predicting links between our existing knowledge and biological data; and by providing new tools for viewing how biological data relates to different species. The tools, training and resources that we develop are easily extended to other species. Not only will we provide data and tools but we also provide integrated, practical training for the next generation of US researchers. This training ensures that researchers are able to use the resources we provide. Our training component also specifically targets traditionally under-represented minorities in science and technology. The outcome of this project is that researchers will be able to more effectively and efficiently convert the power of genomic research into gains for use agriculture and consumers. Overall, the impact of our work is the improved ability for researchers to benefit society through improved agricultural systems, renewable energy, aquaculture, human nutrition, food safety and biotechnology. The societal impact of our education initiative is recruitment of minorities to emerging areas of biology via novel education and training opportunities.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043910103025%
3043910104025%
3043910105025%
3043910108025%
Goals / Objectives
While genomic tools like microarrays, SNP chips, linkage maps and HapMaps are increasingly available for agricultural researchers, for the use of these tools to translate to gains in agriculture requires supporting biocomputational resources. The overall objective of this proposal is to link high throughput data sets and existing knowledge about gene function to phenotypes and functions of importance to agriculture. We will achieve this by (1) providing targeted Gene Ontology (GO) biocuration for agricultural animals; (2) developing computational pipelines to support rapid, functional annotation; and (3) developing resources and tools to link phenotypes and traits to functions. The expected outcomes of this proposal are: (a) core sets of biocurated data that support modeling of genomic and genetics data; (b) computational tools that use this data; (c) expanded outreach and training for agricultural researchers who wish to model their genomics and genetics datasets; (d) rapid analysis of existing literature to support community biocuration projects; (e) computational tools to provide 'first pass' biocuration data for different types of experimental data; (f) improved ability for databases to manage and share annotated data; (g) improved ability for researchers to link traits to function; and (h) omics education for the next generation of agricultural researchers, including minority undergraduate students not traditionally engaged in agricultural research.
Project Methods
Firstly, we will provide targeted Gene Ontology (GO) biocuration for agricultural animals by computationally mining published literature to determine knowledge representation for each agricultural species; using this information to provide targeted manual GO biocuration for key genes being intensively studied in each species; and providing training for agricultural researchers who wish to model their omics data. This fulfills the need for high quality functional annotations to support biological modeling of agricultural animal data. Secondly, we will develop computational pipelines for rapid functional annotation by developing a novel computational 'first pass' annotation pipeline based on extracting data from published literature; scaling up our existing computational pipelines to deal with increasing sequence data that requires functional annotation; and incorporating training about the use of these new annotation data in our GO training workshops. This aim fulfills the need for new computational annotation methods to address the massive scale of genomics data that agricultural researchers face. Thirdly, we will link phenotypes and traits to functions by developing appropriate comparative genomic browsers to allow simultaneous visualization of genomic data across multiple species; and new resources that use computational text mining and knowledge extraction tools to provide large-scale prediction of candidate genes for known QTLs. These tools will be incorporated into our GO training workshops as they become available. This aim fulfills the need for researchers to integrate genomic and genetic data to determine underlying mechanisms of traits and phenotypes.

Progress 04/01/11 to 03/31/14

Outputs
Target Audience: All data produced as part of this project is made freely and publicly available via the AgBase database website. We are disseminating data and tools directly by providing training workshops for the research community. We also report on our progress at the USDA NRSP8 and NC1170 project meetings. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? This project provides PhD training for two students, and provides professional development via training workshops for researchers and postdoctoral associates. In addition, we have developed bioinformatics training resources for undergraduates. How have the results been disseminated to communities of interest? We are disseminating data and tools directly by providing training workshops for the research community. We also report on our progress at the USDA NRSP8 and NC1170 project meetings. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? 1) major activities completed We have completed the development of a bioinformatics course to support training of minority undergraduate students and this is now available as an online resource. We have also completed prioritizing GO annotation for chicken, cow and sheep literature using the eGIFT tool. The development of functional analysis tools continues, as we are working with iPlant to move existing tools onto the iPlant cyberinfrastructure to better facilitate analysis of larger, more complex data sets. 2) specific objectives met (a) targeted Gene Ontology (GO) biocuration for agricultural animals - biocuration of literature continues for agricultural animals. (b) developing computational pipelines to support rapid, functional annotation - we have developed two experimental, annotation pipelines and manuscripts are in preparation. Work continues to develop iTerm mapping files for existing ontologies. (c) developing resources and tools to link phenotypes and traits to functions - we are working with the NSF funded Phenotype RCN to develop phenotype curation methodology and resources for chicken. 3) significant results achieved, including major findings, developments, or conclusions (both positive and negative) AgBase now provides 904,866 GO annotations for 159,111 gene products from 70 species and we are working to expand our biocuration effort to include tissue/cell types, molecular interactions and anatomical structures. A limitation of our current biocuration effort is that there is little biocuration of important agricultural pathogens, an area that is of interest to many agricultural researchers. Moreover, we are working to develop analysis tools that include functional information from other data types than the GO (e.g. includes analysis of tissue/cell types and anatomical structures). 4) key outcomes or other accomplishments realized. The key outcome for this report is the development of a strategy to integrate AgBase tool development within the iPlant cyberinfrastructure. This will enable user with larger data sets to more easily analyze their data, enable researchers to develop collaborative projects and disseminate tools and data to a larger audience.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Trichler, Shauna A., et al. "Identification of canine platelet proteins separated by differential detergent fractionation for nonelectrophoretic proteomics analyzed by Gene Ontology and pathways analysis." Veterinary Medicine: Research & Reports 4 (2014).
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: McCarthy, F. M., and E. Lyons. "From data to function: Functional modeling of poultry genomics data." Poultry science 92.9 (2013): 2519-2529.


Progress 04/01/12 to 03/31/13

Outputs
Target Audience: Provided functional modeling training for agricultural researchers, postdoctoral associated and students. Provided bioinformatics training for undergraduate students at HBCU. Changes/Problems: While the initial proposal included a focus on developing resources and tools to link phenotypes and traits to functions, current developments with the NSF funded Phenotype RCN project and with developing the iAnimal portal based upon the iPlant Collaborative make this goal somewhat obsolete. To support both of these initiatives, we instead plan to develop animal orientated resources on the iPlant cyberinfrastructure. In addition, we are also modifying the content of our training workshops to respond to changing needs as researchers try to model larger and more complex data sets. We are working to incorporate modules on network and pathways analysis and to introduce information about how to develop a first pass functional annotation for species that currently have none. What opportunities for training and professional development has the project provided? This project provides PhD training for two students, and provides professional development via training workshops for researchers and postdoctoral associates. In addition, we have developed bioinformatics training resources for undergraduates. How have the results been disseminated to communities of interest? We are dissemninating data and tools directly by providing training workshops for the research community. We also report on our progress at the USDA NRSP8 and NC1170 project meetings. What do you plan to do during the next reporting period to accomplish the goals? (a) targeted Gene Ontology (GO) biocuration for agricultural animals - Continue biocuration of literature for agricultural animals by developing prioritization lists for additional species (e.g, horse, pig, aquaculture species). Use the existing prioritization lists to expand biocuration efforts for the initial species. (b) developing computational pipelines to support rapid, functional annotation - Develop iTerm mapping files for anatomy, cell/tissue ontologies. (c) developing resources and tools to link phenotypes and traits to functions - We expect that during the next reporting period the focus will change to developing functional analysis tools within the iPlant cyberinfrastructure. While this goal was not included in the initial project outline, it is a natural and logical extension of this proposal that will benefit the broader animal agriculture research community.

Impacts
What was accomplished under these goals? 1) major activities completed We have completed the development of a bioinformatics course to support training of minority undergraduate students and this is now available as an online resource. We have also completed prioritizing GO annotation for chicken, cow and sheep literature using the eGIFT tool. The development of functional analysis tools continues, as we are working with iPlant to move existing tools onto the iPlant cyberinfrastructure to better facilitate analysis of larger, more complex data sets. 2) specific objectives met (a) targeted Gene Ontology (GO) biocuration for agricultural animals - biocuration of literature continues for agricultural animals. (b) developing computational pipelines to support rapid, functional annotation - we have developed two experimental, annotation pipelines and manuscripts are in preparation. Work continues to develop iTerm mapping files for existing ontologies. (c) developing resources and tools to link phenotypes and traits to functions - we are working with the NSF funded Phenotype RCN to develop phenotype curation methodology and resources for chicken. 3) significant results achieved, including major findings, developments, or conclusions (both positive and negative) AgBase now provides 904,866 GO annotations for 159,111 gene products from 70 species and we are working to expand our biocuration effort to include tissue/cell types, molecular interactions and anatomical structures. A limitation of our current biocuration effort is that there is little biocuration of important agricultural pathogens, an area that is of interest to many agricultural researchers. Moreover, we are working to develop analysis tools that include functional information from other data types than the GO (e.g. includes analysis of tissue/cell types and anatomical structures). 4) key outcomes or other accomplishments realized. The key outcome for this report is the development of a strategy to integrate AgBase tool development within the iPlant cyberinfrastructure. This will enable user with larger data sets to more easily analyze their data, enable researchers to develop collaborative projects and disseminate tools and data to a larger audience.

Publications

  • Type: Journal Articles Status: Published Year Published: 2012 Citation: Das PJ, McCarthy F, Paria N, Vishnoi M, Gresham C, Gang L, Kachroo P, Sudderth KA, Teague S, Love CC, Varner DD, Chowdhary BP, and Raudsepp T. (2012). Stallion sperm transcriptome comprises functionally coherent coding and regulatory RNAs as revealed by microarray analysis and RNA-seq. PLOS One 8(2): e56535
  • Type: Journal Articles Status: Accepted Year Published: 2012 Citation: McCarthy F, and Lyons E.L. (2012). From Data to Function: Functional Modeling of Poultry Genomics Data. Poultry Science, in press.
  • Type: Journal Articles Status: Published Year Published: 2012 Citation: Pillai* L, Chouvarine* P, Tudor CO, Schmidt CJ, Vijay-Shanker K, McCarthy FM. (2012). Developing a biocuration workflow for AgBase, a non-model organism database. Database 2012:bas038. doi: 10.1093/database/bas038
  • Type: Journal Articles Status: Published Year Published: 2012 Citation: Tudor CO, Vijay-Shanker K. (2012) RankPref: Ranking Sentences Describing Relations between Biomedical Entities with an Application. BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. 163-171.


Progress 04/01/11 to 03/31/12

Outputs
OUTPUTS: Our outputs for Aim 1 include development of the eGIFT tool to analyze existing literature for each species and use this to develop a ranked list of prioritized genes for manual curation. The eGIFT prototype uses chicken as its target species and we are currently evaluating this and expanding our literature to include other species. We are also providing continued and expanded training for agricultural researchers who wish to model their omics data. We recently held a training workshop ("Genomic Annotation and Functional Modeling Workshop") at the Maxwell H. Gluck Equine Research Center, University of Kentucky (15-16 November, 2011). 32 registered participants attended, including graduate students and postdoctoral researchers. We also participated in a mini-workshop to be held at International Plant & Animal Genome XX (January 14-18, 2012) with the aim of disseminating this knowledge to wider audience within the equine genomics community. Drs McCarthy and Gao offered the first "Introduction to Bioinformatics" class at Alcorn State during the Fall semester. The course is offered as a split level graduate/undergraduate course at both MSU and ASU and forms part of the ASU's Biotechnology Certificate Course. Due to logistics of setting up a new course, this class was only offered to undergraduate students late and only seven undergraduate students enrolled. We are seeking feedback from the initial undergraduate students in this course and expect to be able to expand enrollment during Fall semester of 2012. Students in this class also participated in Community Assessment of Community Annotation with Ontologies (CACAO) initiative. Outcomes for Aim 2 include the development and deployment Genome2Seq, a tool that rapidly looks up genome co-ordinates generated from RNA-Seq data and returns genes and Gene Ontology (GO) annotation when the co-ordinates map to annotated genes and a fasta sequence files when co-ordinates do not map to previously annotated genes. Genome2Seq is available via the AgBase website and the NRSP8 Bioinformatics website. We are also adapting a previously published method that produces more informative summaries by slimming GO terms based on the experimental data set used. This tool is called AutoSlim and is in the final stages of testing prior to its release on AgBase. Two existing AgBase ID mapping tools (ArrayIDer and AffyIDer) are being reconfigured to handle a larger number of accession types and combined for ease of use. This will enable researchers to more easily access existing tools for functional analysis of large data sets. The existing eGIFT is being expanded to allow users to request eGIFT analysis of genes that have not yet entered the database. Upon receiving requests, eGIFT identify the gene specific iTerms, create the corresponding gene page and then notify the requestor that the job was completed. Outcomes for Aim 3 are that we are currently investigating how to apply the text-mining tool we have to QTL analysis. PARTICIPANTS: Individuals: Fiona M McCarthy (PI) - worked on developing GO annotation priority lists, tool development and teaching training workshops. Carl J. Schmidt (coPI) - worked on development of eGIFT and how to apply the text-mining tool we have to QTL analysis. K. Vijay Shanker (coPI) - worked on development of eGIFT. C. Oana Tudor (Postdoc) - worked on development of eGIFT. Ashique Mahmood (Postdoc) - worked on development of eGIFT. Teresia Buza (Postdoc) - quality control of GO annotations in AgBase. Cathy Gresham (RA) - AgBase database management, including tool development and deployment. Tony Arick (RA) - tool development and deployment. Samuel Camacaro - tool development and deployment. Partner organizations University of Delaware is also listed on this project and provides key personnel for the development of eGIFT and comparative genomics tools. Alcorn State University provides training support for undergraduate informatics classes. Texas A&M provides CACAO systems for undergraduate training. University of Arizona provides infrastructure support via iPlant and collaborates on developing tools for avian comparative genomics. Collaborators and contacts Bindu Nanduri (MSU) provides data for molecular interaction for agricultural host-pathogens. Jim Reecy (ISU) provides support via NRSP8 Bioinformatics funding. Training and professional development We provided training workshops for agricultural researchers (including postdocs and graduate students from University of Kentucky) and equine researchers attending PAG XX. We also provided bioinformatics education for undergraduate and graduate students from MSU and Alcorn State University. TARGET AUDIENCES: Researchers - we are targeting agricultural researchers from US University by providing on-site training workshops about functional modeling of their data sets. Students - we are targeting minority students by partnering with Alcorn State University, a HBCU with >80% enrollment of African-Americans. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Impacts from our work include promoting teaching and training of training graduate students, new investigators and undergraduates. Our training workshops are open to graduate students, postdoctoral associates and junior research; frequently they are the ones focused on functional analysis of large data sets and we are able to ensure that they have the training they require to leverage new 'omics' technologies. We target undergraduate students (and additional graduate students) via our MSU/ASU Introduction to Bioinformatics class. Although enrollment numbers were low in this first class, 80% of enrolled students are African-American, 70% female and 75% or graduate students enrolled in this class were dual degree DVM-Ph.D candidates. Moreover, these students interacted with students from other institutes in the US and UK via the COCOA initiative. We are also enhancing infrastructure for research and education by developing online data, tools, training and support to ensure that agricultural researchers and able to leverage their functional genomics data for the benefit of society. Our training workshops ensure that US agricultural researchers are at the forefront of these technologies and as we recruit and train biocurators for AgBase we develop a cadre of highly trained biocurators to support agricultural research; we are one of only three groups providing agricultural functional annotation to the GO Consortium, worldwide. Moreover this project supports cross-disciplinary training for early career researchers working in biology and engineering. For example, co-PI Schmidt uses eGIFT in his Bioinformatics class to introduce students to the concepts of text-mining and ontology annotation. This course is taken by senior undergraduates and graduate students from the life and engineering sciences. Also, one graduate student and one post-doctoral student at the University of Delaware work directly on this project.

Publications

  • Publications: Li X, Swaggerty CL, Kogut MH, Chiang HI, Wang Y, He H, Genovese KJ, McCarthy FM, Burgess SC, Pevzner IY, Zhou H (2011). Systematic response to Campylobacter jejuni infection by profiling spleen gene expression in two genetic lines of chickens that differ in their susceptibility to C. jejuni colonization. Immunogenetics 64(1):59-69.
  • Sanders WS, Wang N, Bridges SM, Malone BM, Dandass YS, McCarthy FM, Nanduri B, Lawrence ML Burgess SC. (2011). The Proteogenomic Mapping Pipeline Tool. BMC Bioinformatics 12:115.
  • Bright LA, Mujahid N, McCarthy FM, Costa LRR, Burgess SC, Swiderski, CE (2011). Functional modelling of an equine bronchoalveolar lavage fluid proteome provides experimental confirmation and functional annotation of equine genome sequences. Animal Genet. 42(4):395-405.
  • Presentations: McCarthy F.M., Schmidt C., Antin P. and Burgess S.C. BirdBase Update: Progress towards standardized gene nomenclature and tissue specific gene expression. Poultry Workshop, Plant and Animal Genome XX Conference January 14-18, 2012. San Diego, California.
  • Carole Nail, Philippe Chouvarine, Janet Weber, Yachi Spencer, Swati Kumari, Shane Burgess, Carl Schmidt, Parker Antin, Fiona McCarthy. Manual biocuration to support standardized chicken gene nomenclature at CGNC. The 6th International Chick Meeting Roslin Institute, Sept. 17-20, 2011.
  • Meeting Abstracts: Carl Schmidt, Philippe Chouvarine, Tim Keeler, Li Jin, Keith Decker, Veronica Shamovsky, Peter D'Eustachio, Parker Antin, Fiona McCarthy. Birdbase: A Database of Avian Genes and Genomes. The 6th International Chick Meeting Roslin Institute, Sept. 17-20, 2011.