Source: AGRICULTURAL RESEARCH SERVICE submitted to
CURATION AND DEVELOPMENT OF SOYBASE AND ITS INTEGRATION WITH OTHER PLANT GENOME DATABASES
Sponsoring Institution
Agricultural Research Service/USDA
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0407002
Grant No.
(N/A)
Project No.
3625-21000-038-00D
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Mar 21, 2003
Project End Date
Mar 20, 2008
Grant Year
(N/A)
Project Director
SHOEMAKER R C
Recipient Organization
AGRICULTURAL RESEARCH SERVICE
RR #3 BOX 45B
AMES,IA 50011
Performing Department
(N/A)
Non Technical Summary
(N/A)
Animal Health Component
(N/A)
Research Effort Categories
Basic
20%
Applied
80%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012499108010%
2011820108090%
Goals / Objectives
Objective 1 - Annotate soybean genomic sequence and mapping data with relevant biological information. Objective 2 - Incorporate new soybean genetic and agronomic data into SoyBase and modify SoyBase and LIS and modify legume databases so that they are able to accommodate emerging data types. Objective 3 - Develop prototypes for the Legume Information System databases.
Project Methods
Soybean DNA sequences will be annotated and genetic and agronomic data will be incorporated into legume databases. Automated annotation tools and tools such as CMAP and Pathway Tools will be adapted and applied. The management software (BioCyc) used to automate curation and edit the pathway database will be evaluated and adapted as needed. Research activities will include comparative analyses, gene discovery, genetic and transcript mapping. SoyBase staff will participate in the development of a Legume Information System (LIS). Investigators will work closely with the National Center for Genomic Research, Santa Fe, NM (NCGR) to integrate soybean and other legume data into a relational Legume Information System that will include map-based analysis tools and visualization systems. The database will focus upon sequence as well as proteomic, phenotypic, genetic and biologically relevant data to facilitate data analysis and interpretation. BL1; Recertified August 12, 2005.

Progress 03/21/03 to 03/20/08

Outputs
Progress Report Objectives (from AD-416) Objective 1 - Annotate soybean genomic sequence and mapping data with relevant biological information. Objective 2 - Incorporate new soybean genetic and agronomic data into SoyBase and modify SoyBase and LIS and modify legume databases so that they are able to accommodate emerging data types. Objective 3 - Develop prototypes for the Legume Information System databases. Approach (from AD-416) Soybean DNA sequences will be annotated and genetic and agronomic data will be incorporated into legume databases. Automated annotation tools and tools such as CMAP and Pathway Tools will be adapted and applied. The management software (BioCyc) used to automate curation and edit the pathway database will be evaluated and adapted as needed. Research activities will include comparative analyses, gene discovery, genetic and transcript mapping. SoyBase staff will participate in the development of a Legume Information System (LIS). Investigators will work closely with the National Center for Genomic Research, Santa Fe, NM (NCGR) to integrate soybean and other legume data into a relational Legume Information System that will include map-based analysis tools and visualization systems. The database will focus upon sequence as well as proteomic, phenotypic, genetic and biologically relevant data to facilitate data analysis and interpretation. Significant Activities that Support Special Target Populations Soybean Expressed Sequence Tags (EST) were positioned on the genetic and physical maps using a combined bioinformatics and experimental approach. In collaboration with Purdue University, we designed overgos and associated them with Bacterial Artificial Chromosomes (BAC). This allowed us to position the ESTs on the physical map, and thus, onto the genetic map. We collaborated with the Plant Ontology Consortium (POC) to add soybean specific terms to the Plant Growth and Development Ontology. This is a first step to the creation of a controlled vocabulary for plant traits and developmental stages which will facilitate the comparison of biological processes among the various legume species. We responded to requests from the soybean community by annotating the exemplar sequences of the gene chip with functional and gene ontology terms. We also associated the probe sequences with the several EST gene indices (SoyBase pHap, TIGR TC and PlantGDB put sequences). These data were made available to the community using a custom database and web interface. The creation of a resource with a consistent annotation of the Soybean GeneChip exemplars will facilitate the comparison of results between research groups. We used CMap to implement an improved display for genetic maps and associated data. We performed ongoing curation and maintenance of SoyBase. Data classes were developed and added to SoyBase as needed to accommodate new data types or to expand functionality. We improved the user interface of the Soybean Breeders Toolbox (SBT) with a custom web-based browsing and querying user interface. An updated version of the soybean physical map was added to SoyBase. The genetic and physical maps were completely integrated via the SBT, CMap and custom modifications to the web map displays. A large resistance gene cluster was characterized in the Medicago genus. We worked closely with colleagues at National Center Genome Resources to design and deploy a set of databases and user interfaces that encompass genomic and EST sequences, and genetic maps with trait-based Quantitative Trait Loci. Tools were developed and deployed to allow the user to select subsets of the data based on developmental or physiological terms. The user interface allows interspecific comparisons for the various data types in Legume Information System. We prepared a Semantic Web service to begin integration of SoyBase and SBT data and analysis tools with the NCGR Virtual Plant Information Network and the SSWAP system. This progress relates to NP 301 Component 2, Problem Statement 2A and 2B because activities enhance the interoperability of plant databases and facilitates the physical and genetic mapping of the soybean genome, thus making it possible to compare genome structures. Technology Transfer Number of Web Sites managed: 3

Impacts
(N/A)

Publications

  • Ameline-Torregrosa, C., Wang, B., O'Bleness, M., Deshpande, S., Zhu, H., Roe, B., Young, N.D., Cannon, S.B. 2008. Identification and characterization of NBS-LRR encoded genes in the model plant medicago truncatula. Plant Physiology. 146:5-21.
  • Grant, D.M., Nelson, R., Graham, M.A., Shoemaker, R.C. 2008. Bioinformatic resources for soybean genetic and genomic research. In: Stacey, Gary, editor. Genetics and Genomics of Soybean. New York, New York. Springer. p. 141-162.


Progress 10/01/06 to 09/30/07

Outputs
Progress Report Objectives (from AD-416) Objective 1 - Annotate soybean genomic sequence and mapping data with relevant biological information. Objective 2 - Incorporate new soybean genetic and agronomic data into SoyBase and modify SoyBase and LIS and modify legume databases so that they are able to accommodate emerging data types. Objective 3 - Develop prototypes for the Legume Information System databases. Approach (from AD-416) Soybean DNA sequences will be annotated and genetic and agronomic data will be incorporated into legume databases. Automated annotation tools and tools such as CMAP and Pathway Tools will be adapted and applied. The management software (BioCyc) used to automate curation and edit the pathway database will be evaluated and adapted as needed. Research activities will include comparative analyses, gene discovery, genetic and transcript mapping. SoyBase staff will participate in the development of a Legume Information System (LIS). Investigators will work closely with the National Center for Genomic Research, Santa Fe, NM (NCGR) to integrate soybean and other legume data into a relational Legume Information System that will include map-based analysis tools and visualization systems. The database will focus upon sequence as well as proteomic, phenotypic, genetic and biologically relevant data to facilitate data analysis and interpretation. Accomplishments Public Release of the Soybean Breeder�s Toolbox. Asking complex questions of databases requires a special structure to the database. In response to requests from USDA scientists and the soybean community, we developed and released the first version of the Soybean Breeder�s Toolbox, an alternate interface to the data in SoyBase with a special emphasis on making the data available in a format most useful to soybean breeders. Genetic maps are now displayed using CMap,which allows a richer display of the maps as well as side-by-side comparisons of related maps. We also provide links from the genetic maps to and from the soybean Williams 82 physical map. The impact of this new design is that soybean breeders can now more intuitively perform searches of the data in SoyBase and can visualize the results both in the context of the genetic and physical maps and as full text records from the database. This allows a more efficient access to complex data sets. This research fits under NP 301, Plant Genetic Resources, Genomics, and Genetics Improvement; Action Plan Component II, Crop Informatics, Genomics, and Genetic Analyses (Problem Statement 2A: Genome Database Stewardship and Informatics Tool Development. Annotation of Affymetrix Soybean GeneChip Array and release of tools for querying the annotations. Global gene expression studies can identify genes potentially involved in important processes. However, unless the identity of the gene is known the information is not valuable. In response to requests from USDA scientists and the soybean community, we generated two types of annotations. First, for each 25 base oligonucleotide on the array we generated a list of every soybean EST consensus sequence from the several public gene indices that contained a perfect sequence match. Second, we provided a putative functional annotation for each gene represented in the array when this could be determined. Web-based tools were developed to allow searching of the annotations with a choice of formats for the results. The impact of these annotations and querying tools is that researchers will be able to better interpret their experimental results and quickly determine the possible function(s) of genes identified in their experiments. This research fits under NP 301, Plant Genetic Resources, Genomics, and Genetics Improvement; Action Plan Component II, Crop Informatics, Genomics, and Genetic Analyses (Problem Statement 2A: Genome Database Stewardship and Informatics Tool Development. Analysis of genes that recognize a soybean symbiont for nodulation. Although nodulation and nitrogen fixation is a critical step in legume physiology, too little is known about the gene families involved in some processes. In this project, we used laboratory and informatic techniques to identify all genes related to LysM receptor genes. These were compared to related sequences from other plants and found that one subgroup, involved in nodulation, is evolving more quickly than other members of this large family. Some members of the family function in pathogen perception and some in legumes function in nodulation. This information is important in that it emphasizes the importance of biological nitrogen fixation and provides clues about the evolution of important plant traits. This research fits under NP 301, Plant Genetic Resources, Genomics, and Genetics Improvement; Action Plan Component II, Crop Informatics, Genomics, and Genetic Analyses, Problem Statement 2B: Structural Comparison and Analysis of Crop Genomes, and Problem Statement 2C: Genetic Analyses and Mapping of Important Traits. Technology Transfer Number of Web Sites managed: 7 Number of Non-Peer Reviewed Presentations and Proceedings: 6

Impacts
(N/A)

Publications

  • Febrer, M., Cheung, F., Town, C.D., Cannon, S.B., Young, N.D., Abberton, M. , Jenkins, G., Milbourne, D. 2007. Construction, characterization and preliminary BAC-end sequencing analysis of a bacterial artificial chromosome library of white clover (Trifolium repens L.). Genome. 50:412- 421.


Progress 10/01/05 to 09/30/06

Outputs
Progress Report 1. What major problem or issue is being resolved and how are you resolving it (summarize project aims and objectives)? How serious is the problem? Why does it matter? This project, 3625-21000-038-00D, Curation and Development of SoyBase and its Integration with Other Plant Genome Databases, is part of NP 301 Plant Genetic Resources, Genomics, and Genetics Improvement. New genomic projects being conducted across the United States have created a huge influx of data. In addition, it has become important to draw upon data generated in related legume species in order to leverage limited research dollars. Without an organized method for acquiring, cataloguing, and distributing this data, much of it would become only marginally useful thus resulting in an inefficient use of public and private research dollars. Therefore, this work is relevant to scientists, students, agency administrators and the general public. SoyBase is evolving from an 'object oriented' database into a relational database scalable to handle all new types of data and updates. Portions of SoyBase are being directed toward breeder-specific issues such as maps, molecular markers, and agronomic traits. SoyBase is also developing close collaborations with database developers throughout the U.S. SoyBase staff are adopting state of the art database tools and structures, by keeping in touch with scientists working in this field, and by collaborative interactions with our colleagues at various universities and institutes. Data are being accumulated far faster than our ability to analyze it. Without a concerted effort to collect and catalog these data, it will soon be lost. The onset of the whole-genome sequencing effort for soybean is a perfect example. Without a concerted effort to develop analysis tools and useful user interfaces the data will not be optimally utilized. This is important because many millions of dollars each year is spent on soybean and related legume research. Without an easily accessible and public database, much of this research would be redundant. A dynamic and up to date database allows researchers to quickly peruse the literature and formulate new hypotheses to test. 2. List by year the currently approved milestones (indicators of research progress) Objective 1: Mapping and Annotation Year 1 - Complete initial annotation of non-genic DNA sequences. Initiate gene annotations for soybean using data from Arabidopsis and other model systems. Initiate placement of cDNAs onto BACs and initiate mapping of BACs and cDNAs. Years 2 and 3 - Complete placement of initial cDNAs onto BACs. Complete genetic mapping of BAC contiglets. Continue gene annotation and refine analysis stream. Years 4 and 5 - Complete refinement of graphical displays for maps, annotations and related biological data. Objective 2: Curation Year 1 - Complete initial data entry for data class 'Transformation'. Develop a list of data class priorities based upon user input. Work with ZmDB staff, NCGR, and U-MN to ensure smooth migration of curated data into future databases. Years 2 and 3 - Continue data entry based on priorities list. Facilitate migration of data into new databases. Years 4 and 5 - Continue data entry based on priorities list. Facilitate migration of data into new databases. Objective 3: Legume Information System (NCGR collaboration) Year 1 - Complete review of current and future bioinformatic needs (acquisition, curation, and data analyses). Years 2 - Develop and Beta-test software. Year 3 - Complete the migration of SoyBase into LIS. Years 4 and 5 - Continuance of database developments and upgrades. Identify and establish a permanent home for LIS. 4a List the single most significant research accomplishment during FY 2006. This project, 3625-21000-038-00D, Curation and Development of SoyBase and its Integration with Other Plant Genome Databases, is part of NP 301 Plant Genetic Resources, Genomics and Genetic Improvement, and fits within Action Plan Component II, Crop Informatics, Genomics, and Genetic Analyses (Problem Statement 2A: Genome Database Stewardship and Information Tool Development). Physical Map: A genetically anchored physical map of chromosomes is essential for the isolation of genes underlying agronomically important Quantitative Trait Loci (QTL). SoyBase staff at the Corn Insect and Crop Genetics Research Unit, Ames, IA, have built a relational database to hold all of the physical and genetic data for soybean. This database is able to display the physical map overlayed onto the genetic map and is populated with agronomically important QTL. ARS staff also developed an online tutorial for the web-based map displays and databases. This database will be useful for the interpretation of whole-genome sequence data being generated by Department of Energy. 5. Describe the major accomplishments to date and their predicted or actual impact. This project, 3625-21000-038-00D, Curation and Development of SoyBase and its Integration with Other Plant Genome Databases, is part of NP 301 Plant Genetic Resources, Genomics and Genetic Improvement, and fits within Action Plan Component II, Crop Informatics, Genomics and Genetic Analyses (Problem Statement 2A: Genome Database Stewardship and Information Tool Development). Customers of this project include legume biologists, students, public and private researchers and administrators of granting agencies. Genetic and genomic data on soybean is dispersed throughout dozens of scientific journals making it difficult to compile comprehensive datasets. This project has created a soybean genetic database making it simple to access genetic and genomic data. Over the life of the project we have developed software programs to identify members of gene families and to identify SNPs within genes, among genotypes. Many new SSR loci were added with details about 5' primers, 3' primers, core motifs, allele size in various germplasms, pictures, and Genbank accession numbers. We interpolated markers from all other published mapping studies onto each linkage group, giving a total of over 3930 loci. QTL data were modified to fit the new map positions. Most categories of SoyBase have been migrated from ACEdb structure into a relational database structure representing 175 interrelated data-type tables. Twenty-five data classes were moved to LIS. With help from the legume community literature was searched and a representative Phaseolus genetic map was compiled and sent to NCGR collaborators for entry into LIS. Genetic map data was formatted and loaded into CMap. The recent `Williams 82' physical map has been fully integrated into CMap and overlaid onto the soybean gentic map. All mapped QTL are now associated with the genetic and physical maps. The project databases are accessed by approximately 50 individuals per day with an average of 1,000 pages served per day. 6. What science and/or technologies have been transferred and to whom? When is the science and/or technology likely to become available to the end- user (industry, farmer, other scientists)? What are the constraints, if known, to the adoption and durability of the technology products? SoyBase is freely available over the internet. We have also provided complete datasets to National Center Genome Resources, other industry and public sector scientists and assisted them in setting up local versions of the database. We have developed and maintain the SoyBase home page where several datasets from projects in progress are made available. Portions of SoyBase are now implemented in the Soybean Breeder's Toolbox which will become public in late summer, 2006. Once data from the physical map project are initially published the physical map component of SoyBase will become public and provided to LIS.

Impacts
(N/A)

Publications

  • Nelson, R., Grant, D.M., Shoemaker, R.C. 2004. Estminer: a suite of programs for gene and allele identification. Bioinformatics. 21(5):691-693.


Progress 10/01/04 to 09/30/05

Outputs
1. What major problem or issue is being resolved and how are you resolving it (summarize project aims and objectives)? How serious is the problem? What does it matter? New genomic projects being conducted across the United States have created a huge influx of data. In addition, it has become important to draw upon data generated in related legume species in order to leverage limited research dollars. Without an organized method for acquiring, cataloguing, and distributing this data, much of it would become only marginally useful thus resulting in an inefficient use of public and private research dollars. SoyBase is evolving from an 'object oriented' database into a relational database scalable to handle all new types of data and updates. SoyBase is also developing close collaborations with database developers throughout the U.S. SoyBase staff are adopting state of the art database tools and structures, by keeping in touch with scientists working in this field, and by collaborative interactions with our colleagues at various Universities and Institutes. How serious is the problem? Why does it matter? Data is being accumulated far faster than our ability to analyze it. Without a concerted effort to collect and catalog this data, it will soon be lost. Without a concerted effort to develop analysis tools and useful user interfaces the data will not be optimally utilized. This is important because many millions of dollars each year is spent on soybean and related legume research. Without an easily accessible and public database, much of this research would be redundant. A dynamic and up to date database allows researchers to quickly peruse the literature and formulate new hypotheses to test. 2. List the milestones (indicators of progress) from your Project Plan. Objective 1: Mapping and Annotation Year 1 Complete initial annotation of non-genic DNA sequences. Initiate gene annotations for soybean using data from Arabidopsis and other model systems. Initiate placement of cDNAs onto BACs and initiate mapping of BACs and cDNAs. Years 2 and 3 Complete placement of initial cDNAs onto BACs. Complete genetic mapping of BAC contiglets. Continue gene annotation and refine analysis stream. Years 4 and 5 Complete refinement of graphical displays for maps, annotations and related biological data. Objective 2: Curation Year 1 Complete initial data entry for data class 'Transformation'. Develop a list of data class priorities based upon user input. Work with ZmDB staff, NCGR, and U-MN to ensure smooth migration of curated data into future databases. Years 2 and 3 Continue data entry based on priorities list. Facilitate migration of data into new databases. Years 4 and 5 Continue data entry based on priorities list Facilitate migration of data into new databases Objective 3: Legume Information System (NCGR collaboration) Year 1 Complete review of current and future bioinformatic needs (acquisition, curation, and data analyses) Years 2 Develop and Beta-test software Year 3 Complete the migration of SoyBase into LIS Years 4 and 5 Continuance of database developments and upgrades Identify and establish a permanent home for LIS 3a List the milestones that were scheduled to be addressed in FY 2005. For each milestone, indicate the status: fully met, substantially met, or not met. If not met, why. 1. Objective 1) The annotation of non-genic (repetitive) soyean sequences, the annotation of soybean genic sequences using Arabidopsis and other model systems, and the placement of cDNAs and the mapping of some BAC contiglets Milestone Fully Met 2. Objective 2) Development of a list of data class priorities, increased cooperation of other database personnel, and migration of data into new databases. Milestone Fully Met 3. Objective 3) Continued review and assessment of current and future bioinformatic needs (acquisition, curation and data analysis), and migration of SoyBase into LIS. Milestone Substantially Met 3b List the milestones that you expect to address over the next 3 years (FY 2006, 2007, and 2008). What do you expect to accomplish, year by year, over the next 3 years under each milestone? In FY 2006, Objective 1) We will complete the first phase of cDNA mapping and gene annotation. During this FY SoyBase staff will increase the amount of interaction with the Gene Ontology working group. Objective 2) We will develop or test existing third party annotation systems with NCGR, U-MN and ZmDB cooperation that will facilitate the migration of new data into databases. Objective 3) We will complete the migration relevant of most SoyBase data into the Legume Information System. In FY 2007, Objective 1) We will complete the refinement of graphical displays for maps, gene annotations and related biological data such as metabolic pathways. Objective 2) We will continue data entry into public databases pursuant to data priorities. Objective 3) We will continue database upgrades and will identify a permanent home for the Legume Information System. In FY 2008 Objective 1) We will have established databases and applications for gene expression data and will be working on integrating expression data with genetic map data. Objective 2) We will continue data entry into public databases pursuant to data priorities established by community user groups. Objective 3) We will continue database developments and upgrades and adapt as technologies and data types change. 4a What was the single most significant accomplishment this past year? Translational Genomics: The ability to translate genetic progress from one species to another is critical to rapid advancement in crop performance. The Corn Insects and Crop Genetics Research Unit, Ames, Iowa extracted genetic maps of soybean and common bean from public databases and the literature. The data was compiled and formatted for CMap, a program that allows for the side-by-side comparisons of chromosomal maps of any species. This permits the direct transfer of genetic progress in one species to another, without the time and expense required for the additional research. For example, genes for Asian soybean rust resistance in common bean may now be mapped, and the information can be immediately translated to soybean. 4b List other significant accomplishments, if any. Other significant accomplishments in FY2005 included the movement of 25 classes of SoyBase data into LIS, and the migration of nine other data classes into MySQL as part of the change from current SoyBase structure into a relational database structure. 5. Describe the major accomplishments over the life of the project, including their predicted or actual impact. This project, 3625-21000-038-00D, Curation and Development of SoyBase and its Integration with Other Plant Genome Databases, is part of NP 301 Plant, Microbial, and Insect Genetic Resources, Genomics and Genetic Improvement, and fits within Action Plan Component III, Bioinformatics and Genome Databases. Over the life of the project we have developed software programs to identify members of gene families and to identify SNPs within genes, among genotypes. Many new SSR loci were added with details about 5' primers, 3' primers, core motifs, allele size in various germplasms, pictures, and Genbank accession numbers. We interpolated markers from all other published mapping studies onto each linkage group, giving a total of over 3930 loci. QTL data were modified to fit the new map positions. Progress is being made in transferring the data from ACEdb structure into a relational database structure and nine data classes are being B-tested. Twnenty-five data classes have been moved to LIS. With help from the legume community literature was searched and a representative Phaseolus genetic map was compiled and sent to NCGR collaborators for entry into LIS. Genetic map data was formatted and loaded into CMap. 6. What science and/or technologies have been transferred and to whom? When is the science and/or technology likely to become available to the end- user (industry, farmer, other scientists)? What are the constraints, if known, to the adoption and durability of the technology products? SoyBase is freely available over the internet. We have also provided complete datasets to NCGR, other industry and public sector scientists and assisted them in setting up local versions of the database. We have developed and maintain the SoyBase home page where several datasets from projects in progress are made available. These include information on commodity-funded physical map and transcript map development. Once the data from these projects are published they will be moved into SoyBase and LIS.

Impacts
(N/A)

Publications

  • Nelson, R., Grant, D.M., Shoemaker, R.C. 2004. Estminer: a suite of programs for gene and allele identification. Bioinformatics. 21(5):691-693.


Progress 10/01/03 to 09/30/04

Outputs
1. What major problem or issue is being resolved and how are you resolving it (summarize project aims and objectives)? How serious is the problem? What does it matter? New genomic projects being conducted across the United States have created a huge influx of data. In addition, it has become important to draw upon data generated in related legume species in order to leverage limited research dollars. Without an organized method for acquiring, cataloguing, and distributing this data, much of it would become only marginally useful thus resulting in an inefficient use of public and private research dollars. SoyBase is evolving from an 'object oriented' database into a relational database scalable to handle all new types of data and updates. SoyBase is also developing close collaborations with database developers throughout the U.S. SoyBase staff are adopting state of the art database tools and structures, by keeping in touch with scientists working in this field, and by collaborative interactions with our colleagues at various Universities and Institutes. How serious is the problem? Why does it matter? Data is being accumulated far faster than our ability to analyze it. Without a concerted effort to collect and catalog these data, it will soon be lost. Without a concerted effort to develop analysis tools and useful user interfaces the data will not be optimally utilized. This is important because many millions of dollars each year is spent on soybean and related legume research. Without an easily accessible and public database, much of this research would be redundant. A dynamic and up to date database allows researchers to quickly peruse the literature and formulate new hypotheses to test. 2. List the milestones (indicators of progress) from your Project Plan. Objective 1: Mapping and Annotation Year 1 Complete initial annotation of non-genic DNA sequences. Initiate gene annotations for soybean using data from Arabidopsis and other model systems. Initiate placement of cDNAs onto BACs and initiate mapping of BACs and cDNAs. Years 2 and 3 Complete placement of initial cDNAs onto BACs. Complete genetic mapping of BAC contiglets. Continue gene annotation and refine analysis stream. Years 4 and 5 Complete refinement of graphical displays for maps, annotations and related biological data. Objective 2: Curation Year 1 Complete initial data entry for data class 'Transformation'. Develop a list of data class priorities based upon user input. Work with ZmDB staff, NCGR, and U-MN to ensure smooth migration of curated data into future databases. Years 2 and 3 Continue data entry based on priorities list. Facilitate migration of data into new databases. Years 4 and 5 Continue data entry based on priorities list. Facilitate migration of data into new databases. Objective 3: Legume Information System (NCGR collaboration) Year 1 Complete review of current and future bioinformatic needs (acquisition, curation, and data analyses). Year 2 Develop and Beta-test software. Year 3 Complete the migration of SoyBase into LIS. Years 4 and 5 Continuance of database developments and upgrades. Identify and establish a permanent home for LIS. 3. Milestones: A. List the milestones that were scheduled to be addressed in FY 2004. How many milestones did you fully or substantially meet in FY 2004 and indicate which ones were not fully or substantially met, briefly explain why not, and your plans to do so. A list of the milestones that were scheduled to be met in FY2004 include: Objective 1): the annotation of non-genic (repetitive) soybean sequences, the initial annotation of soybean genec sequences using Arabidopsis and other model systems, and the initial placement of cDNAs onto BACs. Objective 2): completion of the initial data entry for the class "transformation", development of a list of data class priorities, and increased cooperation of other database personnel. Objective 3): completion of a review and assessment of current and future bioinformatic needs (acquisition, curation and data analysis). Most milestones from each of the three objectives were met in full. Data models for the class Transformation have been developed and movement of SoyBase data into this structure is in place. We anticipate completing this phase by 1 December 2004. Annotation of soybean ESTs was initiated using known Arabidopsis and Gene Ontology annotations. Because most GO annotations are built from mammalian or non-plant systems it was determined that GO annotations for many plant genes are lacking. An initial 50 seed-relateded cDNAs have been assigned to BACs using overgo technology. Working with personnel from the ZmDB database the relational structure for transformation data has been developed and migration of SoyBase data into the relational database. A review of current and future legume data needs was carried out during a workshop at NCGR, Santa Fe, during November 2003. This evaluation included input from Zmdb and U-MN personnel. Data needs and priorities were established. B. List the milestones that you expect to address over the next 3 years (FY 2005, 2006, and 2007). What do you expect to accomplish, year by year, over the next 3 years under each milestone? In FY2005, Objective 1): We will continue to place cDNAs onto BAC clones. We will continue to enter data into a new relational database pursuant to data priorities. Objective 2): We will continue to refine gene annotations and to ensure these annotations are entered into the correct databases. Objective 3): We will begin the development and testing of software developed as part of visualization needs or data analysis priorities. Software developed by SoyBase personnel in Year 1, will be moved to NCGR for beta-testing. In FY 2006, Objective 1): We will complete the first phase of cDNA mapping and gene annotation. During this FY SoyBase staff will increase the amount of interaction with the Gene Ontology working group. Objective 2): We will develop third party annotation systems with NCGR, U-MN and ZmDB cooperation that will facilitate the migration of new data into databases. Objective 3): We will complete the migration of most SoyBase data into the Legume Information System. In FY 2007, Objective 1): We will complete the refinement of graphical displays for maps, gene annotations and related biological data such as metabolic pathways. Objective 2): We will continue data entry into public databases pursuant to data priorities. Objective 3): We will continue database upgrades and will identify and establish a permanent home for the Legume Information System. 4. What were the most significant accomplishments this past year? A. Single Most Significant Accomplishment during FY 2004. The single most significant accomplishment in FY2004 was the analyzed soybean ESTs to identify members of gene families and to define locus- defining polymporphisms. We grouped ESTs into gene families and identified unique polymorphisms that distinguish each family member. This was followed by the identification of SNPs (alleles) between homologs in the germplasm used to construct the EST libraries. B. Other Significant Accomplishment(s). Other significant accomplishments in FY2004 included the development of software to aid in assigning GO (gene ontology) annotations to soybean genes, and the addition of Phaseolus genetic maps to LIS with links to soybean data where appropriate. C. Significant accomplishments/activities that support special target populations. None. 5. Describe the major accomplishments over the life of the project, including their predicted or actual impact. This project, 3625-21000-038-00D, Curation and Development of SoyBase and its Integration with Other Plant Genome Databases, is part of NP 301 Plant, Microbial, and Insect Genetic Resources, Genomics and Genetic Improvement, and fits within Action Plan Component III, Bioinformatics and Genome Databases. Over the life of the project we have developed software programs to identify members of gene families and to identify SNPs within genes, among genotypes. Many new SSR loci were added with details about 5' primers, 3' primers, core motifs, allele size in various germplasms, pictures, and Genbank accession numbers. We interpolated markers from all other published mapping studies onto each linkage group, giving a total of over 3930 loci. QTL data were modified to fit the new map positions. Progress is being made in transferring the data from ACEdb structure into a relational database structure. With help from the legume community literature was searched and a representative Phaseolus genetic map was compiled and sent to NCGR collaborators for entry into LIS. 6. What science and/or technologies have been transferred and to whom? When is the science and/or technology likely to become available to the end- user (industry, farmer, other scientists)? What are the constraints, if known, to the adoption and durability of the technology products? SoyBase is freely available over the internet. We have also provided complete datasets to NCGR, other industry and public sector scientists and assisted them in setting up local versions of the database. We have developed and maintain the SoyBase home page where several datasets from projects in progress are made available. These include information on commodity-funded physical map and transcript map development. Once the data from these projects are published they will be moved into SoyBase and LIS.

Impacts
(N/A)

Publications