Source: AGRICULTURAL RESEARCH SERVICE submitted to
A STRATEGY FOR RESPONDING TO THE WHOLE-GENOME SHOTGUN SEQUENCE OF THE SOYBEAN GENOME
Sponsoring Institution
Agricultural Research Service/USDA
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0412038
Grant No.
(N/A)
Project No.
3625-21000-052-01R
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Jul 1, 2007
Project End Date
Apr 30, 2011
Grant Year
(N/A)
Project Director
SHOEMAKER R C
Recipient Organization
AGRICULTURAL RESEARCH SERVICE
(N/A)
AMES,IA 50010
Performing Department
(N/A)
Non Technical Summary
(N/A)
Animal Health Component
(N/A)
Research Effort Categories
Basic
90%
Applied
10%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011820108090%
2042499108010%
Goals / Objectives
1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped.
Project Methods
Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and SNP markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases.

Progress 07/01/07 to 04/30/11

Outputs
Progress Report Objectives (from AD-416) 1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped. Approach (from AD-416) Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and single nucleotide polymorphism (SNP) markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases. During the reporting period ARS scientists at Ames, IA and Beltsville, MD improved upon the assembly of the whole-genome sequence of soybean. The analysis of data resulting from the genotyping of recombinant inbred lines of Williams 82 x Glycine soja PI 479752 with the 50,000 single nucleotide polymorphisms (SNP) markers on the Illumina Soybean GeneChip was continued. An additional set of recombinant inbred lines was used which will bring the population size to nearly 1,100 lines. The team investigated a region of chromosome 11 that may either have an assembly error or a chromosomal inversion in some soybean lines. Working with researchers at Virginia, the team identified a 1.4 Mb region at the end of chromosome 11 where three new genetic maps show inverted marker orders with respect to the current assembly. Adjacent to this region, evidence was found that approximately 50,000 bases are missing from the current assembly. Although this is a very small portion of the full genome (0. 005%), mapping studies show that the missing sequence contains an important gene: one that is substantially responsible for soybean seed phytate and stachyose levels. Both stachyose and phytate are major anti- nutritional compounds, so resolving and correcting this region of the genome assembly is important. Progress on this project has been documented through written quarterly reports, discussions at conferences, and conference calls.

Impacts
(N/A)

Publications


    Progress 10/01/09 to 09/30/10

    Outputs
    Progress Report Objectives (from AD-416) 1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped. Approach (from AD-416) Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and SNP markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases. During the reporting period ARS scientists at Ames and Beltsville, MD improved upon the assembly of the whole-genome sequence of soybean. The soybean genome sequence was published in the January 14, 2010 issue of the journal Nature. The Ames group used the physical map and sequence analysis to identify areas of the genome assembly that can be improved in subsequent assembly versions. Using high-powered microscopy and collaborating with scientists at the University of Missouri and Purdue University, they identified a portion of chromosome 13 (linkage group F) that will need to be revised. They mapped genomic sequences from several perennial species related to soybean onto the soybean genome sequence. The team also increased the number of sequence-based molecular markers. Sequence-based and mapped markers aid in ensuring the quality assembly of the genome sequence. All information related to 33,065 new genetic (Simple Sequence Repeat) markers are being made available on a website to allow ready access to soybean breeders and geneticists: http://www. soybase.org/. In addition, the entire BARCSOYSSR_1.0 database can be downloaded at the USDA, Beltsville at: http://bldg6.arsusda. gov/~pooley/soy/cregan/BARCSOYSSR_1.0.html. Other progress includes the genotyping of a portion (374 of the 1000 lines) of the Williams 82 x PI 468916 recombinant inbred line (RIL) mapping population with the 50,000 genetic (SNP) markers on the new Soybean GeneChip. These data are still being analyzed so it is not as yet clear how many of the markers are segregating in the population. Progress on this project has been documented through written quarterly reports and conference calls.

    Impacts
    (N/A)

    Publications


      Progress 10/01/08 to 09/30/09

      Outputs
      Progress Report Objectives (from AD-416) 1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped. Approach (from AD-416) Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and SNP markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases. Significant Activities that Support Special Target Populations A key to the genome assembly was the extensive genetic map, which was extended substantially through this project. The total assembled and anchored genome sequence comprises 949.7 million bases in 20 chromosome sequences. The genome contains more than 46,000 predicted genes. The large primary sequence �scaffold assemblies� that make up the pseudo- chromosomal sequences are anchored with 5883 genetic markers. These help determine the relative positions and orientations of the scaffold assemblies on 20 chromosome sequences. We identified 38 scaffold sequences for further mapping efforts by ARS collaborators in Beltsville, MD. The mapping population consists of 972 F5-6 Recombinant Inbred Lines (RIL). Thirty-nine supercontigs of the 8X soybean genome assembly that had �assembly issues� were analyzed for the presence of useful Simple Sequence Repeats (SSR). It was determined that super_275 was duplicated and that super_187 is a duplicate of scaffold_22. Of the 37 remaining supercontigs with assembly problems, primer sets were designed to SSRs in 36. Of the 181 primer sets designed to these 36 supercontigs, a total of 48 primer sets, determined to be polymorphic were mapped in the Williams x PI468916 RIL population. The Gmax1.01 soybean genome assembly has a total of 40 floating sequence scaffolds with sequence length =50,000 base pairs (bp). However, after analysis of the sequence in these scaffolds, it was determined that unique SSR primer sets with a high likelihood of success could only be designed to 20 of the 40 scaffolds. Of the 61 primer sets designed to these 20 floating scaffolds, a total of 29 markers were determined to be polymorphic between Williams 82 and PI468916. Mapping data from 19 markers were generated from the Williams 82 x PI 468916 RIL. We also prepared a genome browser for public access to the genome sequence and features. This was made public in mid- December, 2008, when the genome sequence and annotations were released to the public by the sequencing consortium. The genome browser includes tools for name-based and sequence-based searches, as well as many features of interest to soybean researchers, including predicted genes and similarities to genes in ten other legume species. Features in the browser also link to the genetic map resources in the Soybean Breeder�s Toolbox, which will enable breeders and researchers to go back and forth between genetically-mapped regions of known agronomic traits (Quantitative Trait Loci) and sequence regions with predicted genes. Progress on this project has been documented through written reports and conference calls.

      Impacts
      (N/A)

      Publications


        Progress 10/01/07 to 09/30/08

        Outputs
        Progress Report Objectives (from AD-416) 1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped. Approach (from AD-416) Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and SNP markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases. Significant Activities that Support Special Target Populations Approximate ordering and orienting (O&O) of most sequence scaffolds using a consensus genetic map with 5503 markers was carried out. Duplicated regions in the sequence assembly helped to identify potential assembly problem areas or area for which additional evaluation is needed. We evaluated the assembly using genomic landmarks such as telomeric and centromeric repeats and ribosomal DNA. We used the curated Williams physical map, and additional clone pairs not used in the primary assembly, to identify additional scaffold associations and potential misassemblies. We are approaching a genome sequence in which more than 96% of the WGS sequence, and virtually all of the gene-containing sequence, has been ordered, oriented, and positionally validated. We refined the O&O by constructing a special-purpose high-resolution map (in a mapping population of 444 progeny of a G. max (Williams 82) x G. soja (PI 468916) cross, with markers selected from potentially problematic regions in the 7x draft WGS. Leaf tissue was used to harvest DNA from more than 576 single-seed-descent (SSD) derived F5 plants from Williams 82 x PI 468916 to permit the analysis of the population with SSR markers. Solexa sequencing was conducted on PI 468916 and the sequence was overlain onto Williams 82 genomic sequence to identify SNPs. New SNP markers (1536) were analyzed in 460 F5-derived RILs of the Williams 82 x PI 468916 population. These SNPs were from the more than 25,000 SNPs discovered from the Solexa sequencing. One thousand two hundred forty-three gave successful assays (81% success rate). SNPs were carefully selected to anchor the sequence to the genetic map. A total of 551 SNPs that had previously been positioned on the map were mapped on the Williams 82 x PI 468916 population. This is an average of 27.5 markers per LG and allowed us to tie the new Williams 82 x PI 468916 map to the Soybean Consensus Map. A total of 106 SSR markers were genotyped in each of 470 F5-derived RILs from the cross of Williams 82 x PI 468916 Cultivated x Wild Soybean Mapping Population. The 106 markers included SSR markers that have been previously mapped. To date, 71 SSRs from the current Soybean Consensus Map were genotyped in the population. These SSR primers were selected from sequence scaffolds that were either not anchored to the Soybean Consensus Map or scaffolds that were unoriented. This project is part of NP 301 Plant Genetic Resources, Genomics, and Genetic Improvement, and fits within Action Plan Component 2, Crop Informatics, Genomics and Genetic Analyses, Problem Statement 2B: Structural Comparison and Analysis of Crop Genomes. Progress on this project has been documented through written reports and conference calls.

        Impacts
        (N/A)

        Publications


          Progress 10/01/06 to 09/30/07

          Outputs
          Progress Report Objectives (from AD-416) 1) Assemble the DNA sequence and connect it to the soybean genetic 'road map'. 2) Additional SNP genetic markers will be developed and mapped. Approach (from AD-416) Whole genome sequence (WGS) will be overlaid onto the physical and genetic map. Paired ends will be identified that indicate DNA clones span gaps in the genome sequence. BAC end sequences will be aligned to the WGS thus overlaying the sequence onto the physical map. Genome sequence from PI 468916 will be aligned with the WGS of Williams 82 and SNPS will be identified. Information will be provided to collaborators at USDA-ARS, Beltsville and SNP markers will be developed and mapped. A 200 SSR map infrastructure will be created in a cross population between Williams 82 and PI 468916. Approximately 4,000 SNPs will be mapped. The DNA sequence from which the SNPs were detected will be correlated with the WGS, thus anchoring the sequence map with the genetic map. Data will be entered into public soybean genome databases. Significant Activities that Support Special Target Populations This report documents research conducted under a reimbursable agreement between ARS and the United Soybean Board. Additional details of research are in the report for the parent project 3625-21000-038-00D Curation and Development of SoyBase and its Integration with other Plant Genome Databases. This project is part of NP 301 Plant Genetic Resources, Genomics and Genetics Improvement, and fits within Action Plan Component II, Crop Informatics, Genomics and Genetic Analyses, Problem Statement 2A: Genome Database Stewardship and Informatics Tool Development, and 2B: Structural Comparison and Analysis of Crop Genomes. Progress on this project will be monitored through written progress reports, conference calls, email contact and personal visits. During the reporting period a new SCA with Purdue University was initiated and recruitment of an ARS post-doctoral was begun. An initial assessment of the 4X genome assembly was conducted using sequence-based genetic markers.

          Impacts
          (N/A)

          Publications