Source: COLD SPRING HARBOR LABORATORY ASSOCIATION, INC submitted to
A BEST-OF-BREED SOFTWARE PACKAGE FOR MANAGING PLANT GENOME INFORMATION
Sponsoring Institution
Agricultural Research Service/USDA
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0403840
Grant No.
(N/A)
Project No.
1907-21000-023-06S
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Sep 15, 2000
Project End Date
Sep 14, 2005
Grant Year
(N/A)
Project Director
WARE D
Recipient Organization
COLD SPRING HARBOR LABORATORY ASSOCIATION, INC
1 BUNGTOWN RD
COLD SPRING HARBOR,NY 11724-2209
Performing Department
COLD SPRING HARBOR LABORATORY
Non Technical Summary
(N/A)
Animal Health Component
(N/A)
Research Effort Categories
Basic
33%
Applied
33%
Developmental
34%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2021530108050%
2031549108050%
Goals / Objectives
The objective of this project is to assemble, document and package a portable suite of database technology, user interfaces and ancillary software for the management and presentation of ARS plant databases.
Project Methods
ARS genome databases will be evaluated with respect to current technical status and future needs with particular focus on the impact of high-throughput genomic data. A migration path for existing databases will be designed and implemented to allow use of other database systems, including relational ones. The package will include a distributed system for viewing and managing genome annotations, with the focus on the integration of genetic and physical mapping data with gene predictions and EST-based expression data. The system will be generic to permit its use with data from a wide variety of species, including animals and microbes.

Progress 10/01/04 to 09/30/05

Outputs
4d Progress report. This report serves to document research conducted under a Specific Cooperative Agreement between ARS and Cold Spring Harbor Laboratory. Additional details of this research can be found in the report of the parent project 1907-21000-014-00D Comparative Plant Genomics. The focus of the "Best-of-Breed" SCA provides funds to coordinate genome database software development, to document and integrate existing software, and to undertake the development of new software to fill gaps identified in the current holdings. The project was intimately tied to the NIH-funded "Generic Model Organism Database" project, a broader effort to achieve the same goals for the NIH-funded model organism databases. The National Science Foundation funded other portions of this collaborative effort. The past year included updates in two of the software applications directly funded by the SCA in addition to several applications that were previously or indirectly supported by the SCA. CMap, directly supported by the SCA, is a tool for managing genetic and physical mapping data, and for visualizing the relationships among multiple maps. USDA-funded projects that use CMap include GrainGenes (wheat, barley), the Legume Information System (soybean, medicago), Gramene (rice, maize, sorghum), GDR (Rosaceae species), and MaizeGDB (maize). Other projects that have deployed CMap include the chicken- sequencing project at the Genome Sequencing Center of Washington University at St. Louis, BeeBase, the honeybee genome database at TAMU, and the bovine mapping database, CompLDB, at the Department of Animal Sciences University of Sydney. The GBrowse project, directly supported by the SCA, is a system for visualizing the annotations of partial and complete genomes and is similar in concept to the web-based genome browsers at Ensembl and UCSC. Early development of GBrowse was funded entirely by the SCA, with later enhancements subsidized with NIH funds. Multiple USDA-funded databases use GBrowse: Gramene, GrainGenes, MaizeGDB, Legume Information System, the Arizona Genomics Computational Lab's FPC (physical maps) web site, and The Institute for Genomic Research (TIGR) databases for multiple plant species including poplar, cotton, and tobacco. Moreover, it sees much use by such non-USDA databases as WormBase (C. elegans), dictyBase (D. discoideum), FlyBase (D. melanogaster), T1DBase (human type 1 diabetes), RGD (rat), the International Hapmap Project, and the European Arabidopsis Stock Centre (NASC). SynBrowse is a recent offshoot of GBrowse. It is a web-based visualization for macro- and micro-synteny at the genomic level that essentially superimposes two GBrowse panels to show the relationship between two genomes or two regions within the same genome. Initiated under SCA auspices at CSHL, this project then migrated to MaizeGDB, where Xiaokang Pan, who works under USDA ARS funding, carries it on. SynBrowse has only recently been released. Although it is not yet widely used outside of MaizeGDB, it is slated for incorporation into WormBase in late 2005 and has been downloaded multiple times. Although not directly funded by the SCA, the project has contributed in a major way to the coordination of the development of a suite of other tools that mutually interoperate, thereby reducing the startup costs for new USDA databases and encouraging the interoperability among existing databases. The SCA, in combination with NIH funds, helped support a series of meetings among genome database software developers and curators (CSHL, October 2004 and Menlo Park, March 2005). These meetings were instrumental in transforming an insular and parochial community into one that is now interactive and highly cooperative. In addition, there were numerous smaller workshops, tutorials, and informal get-togethers at the Plant and Animal Genome meetings held each January in San Diego and the October Genome Informatics meetings held at CSHL and Hinxton. The GMOD meetings have also spurred the creation of a new annual BioCurators' meeting, which brings together biological curators from multiple databases, including those funded by the USDA/ARS. The first BioCurators' meeting will be held in December 2005. The list of software packages that the SCA has contributed to indirectly include: 1. Chado. This is a genome database schema that acts as the backend for GBrowse, CMap, Apollo, Turnkey, and other components of the GMOD system. 2. Apollo. This is a desktop editor for genomic data, which interoperates with GBrowse via chado. 3. Turnkey. This is a template-based web site builder that can be used to tie together the GMOD components into a seamless genomic information web site. 4. Textpresso. This is a scientific publication markup and search system that makes a library of reprint PDFs easily searchable. 5. BioMart. This is a data mining system that allows researchers to perform complex ad hoc searches on genomics data and generate customized reports. 6. Pathway Tools. This is a system for curating and displaying biological pathway data and viewing large-scale data sets, such as microarray expression analysis, on top of pathways. 7. PubFetch/PubSearch. These are tools that provide support for literature curation and gene-level functional annotation.

Impacts
(N/A)

Publications


    Progress 09/15/00 to 09/14/05

    Outputs
    Progress Report 4d Progress report. This report serves to document research conducted under a Specific Cooperative Agreement between ARS and Cold Spring Harbor Laboratory. Additional details of this research can be found in the report of the parent project 1907-21000-023-00D "Comparative Genomic Analyses, Bioinformatics and Resource Development for Cereal Genomes." The focus of the "Best-of-Breed" SCA provides funds to coordinate genome database software development, to document and integrate existing software, and to undertake the development of new software to fill gaps identified in the current holdings. The project was intimately tied to the NIH-funded "Generic Model Organism Database" project, a broader effort to achieve the same goals for the NIH-funded model organism databases. The National Science Foundation funded other portions of this collaborative effort. The past year included updates in two of the software applications directly funded by the SCA in addition to several applications that were previously or indirectly supported by the SCA. CMap, directly supported by the SCA, is a tool for managing genetic and physical mapping data, and for visualizing the relationships among multiple maps. USDA-funded projects that use CMap include GrainGenes (wheat, barley), the Legume Information System (soybean, medicago), Gramene (rice, maize, sorghum), GDR (Rosaceae species), and MaizeGDB (maize). Other projects that have deployed CMap include the chicken- sequencing project at the Genome Sequencing Center of Washington University at St. Louis, BeeBase, the honeybee genome database at TAMU, and the bovine mapping database, CompLDB, at the Department of Animal Sciences University of Sydney. The GBrowse project, directly supported by the SCA, is a system for visualizing the annotations of partial and complete genomes and is similar in concept to the web-based genome browsers at Ensembl and UCSC. Early development of GBrowse was funded entirely by the SCA, with later enhancements subsidized with NIH funds. Multiple USDA-funded databases use GBrowse: Gramene, GrainGenes, MaizeGDB, Legume Information System, the Arizona Genomics Computational Lab's FPC (physical maps) web site, and The Institute for Genomic Research (TIGR) databases for multiple plant species including poplar, cotton, and tobacco. Moreover, it sees much use by such non-USDA databases as WormBase (C. elegans), dictyBase (D. discoideum), FlyBase (D. melanogaster), T1DBase (human type 1 diabetes), RGD (rat), the International Hapmap Project, and the European Arabidopsis Stock Centre (NASC). SynBrowse is a recent offshoot of GBrowse. It is a web-based visualization for macro- and micro-synteny at the genomic level that essentially superimposes two GBrowse panels to show the relationship between two genomes or two regions within the same genome. Initiated under SCA auspices at CSHL, this project then migrated to MaizeGDB, where Xiaokang Pan, who works under USDA ARS funding, carries it on. SynBrowse has only recently been released. Although it is not yet widely used outside of MaizeGDB, it is slated for incorporation into WormBase in late 2005 and has been downloaded multiple times. Although not directly funded by the SCA, the project has contributed in a major way to the coordination of the development of a suite of other tools that mutually interoperate, thereby reducing the startup costs for new USDA databases and encouraging the interoperability among existing databases. The SCA, in combination with NIH funds, helped support a series of meetings among genome database software developers and curators (CSHL, October 2004 and Menlo Park, March 2005). These meetings were instrumental in transforming an insular and parochial community into one that is now interactive and highly cooperative. In addition, there were numerous smaller workshops, tutorials, and informal get-togethers at the Plant and Animal Genome meetings held each January in San Diego and the October Genome Informatics meetings held at CSHL and Hinxton. The GMOD meetings have also spurred the creation of a new annual BioCurators' meeting, which brings together biological curators from multiple databases, including those funded by the USDA/ARS. The first BioCurators' meeting will be held in December 2005. The list of software packages that the SCA has contributed to indirectly include: 1. Chado. This is a genome database schema that acts as the backend for GBrowse, CMap, Apollo, Turnkey, and other components of the GMOD system. 2. Apollo. This is a desktop editor for genomic data, which interoperates with GBrowse via chado. 3. Turnkey. This is a template-based web site builder that can be used to tie together the GMOD components into a seamless genomic information web site. 4. Textpresso. This is a scientific publication markup and search system that makes a library of reprint PDFs easily searchable. 5. BioMart. This is a data mining system that allows researchers to perform complex ad hoc searches on genomics data and generate customized reports. 6. Pathway Tools. This is a system for curating and displaying biological pathway data and viewing large-scale data sets, such as microarray expression analysis, on top of pathways. 7. PubFetch/PubSearch. These are tools that provide support for literature curation and gene-level functional annotation.

    Impacts
    (N/A)

    Publications


      Progress 10/01/03 to 09/30/04

      Outputs
      4. What were the most significant accomplishments this past year? D. Progress Report: This report serves to document research conducted under a Specific Cooperative Agreement between ARS and Cold Spring Harbor Laboratory (CSHL) entitled, "A best-of-breed software package for managing plant genome information." Additional details of research can be found in the report for the parent project 1907-21000-013-00D, "Dissection of complex maize traits using genomics, germplasm, and bioinformatics." The objective of this project is to assemble, document and package, a portable suite of database technology, user interfaces, and ancillary software for the management and presentation of ARS plant databases. This project focused on development of visualization tools for genomes in collaboration with Dr. Lincoln Stein of CSHL. Specific resource development includes the ongoing support of the Gramene database (www. gramene.org) and the development of open source software as part of the GMOD project (www.gmod.org). The Gramene database continues to be maintained and updated on quarterly basis. In the past year, software has been updated for the genome browser, CMap view, marker view, mutant and QTL. For the rice genome, we have transitioned from a BAC based display to the assembly. The cereal sequences, and maps were updated twice this year. We have completed curation of all available rice QTL from the literature for the past 5 years. Online tutorials were developed and implemented for each Gramene search module.

      Impacts
      (N/A)

      Publications