Progress 09/01/09 to 08/31/13
Outputs Target Audience: Barley researchers and others may access, download and use data, maps and assemblies generated by this project through freely accessible portals including http://harvest.ucr.edu, www.harvest-blast.org, www.harvest-web.org, http://triticeaetoolbox.org, http://trace.ncbi.nlm.nih.gov/Traces/sra, and through other portals posted on www.barleygenome.org, which is the website of the International Barley Sequencing Consortium. Changes/Problems: The original proposal for this project ended with a list of alternative, highly optimistic objectives ("Plan B"), looking ahead to reduced sequencing costs and possible opportunities to leverage other work. The review panel looked favorably on adjustments of the work plan to take advantage of such developments. Sequencing costs in fact did decline significantly. Consequently, all plans were adjusted upward in year 2. The major adjustments were: 1) the low copy portion of the genome was identified by shotgun sequencing instead of high Cot fractionation, 2) the transcriptome was sequenced more deeply and from more libraries than originally planned, 3) hypomethylated fragments were sequenced more deeply, 4) the complete set of about 15,600 gene-bearing minimal tiling path BACs was included in the sequencing instead of only about 2,000 previously mapped BACs. Additional enhancements were to maximize the number of sequenced BACs and sequence scaffolds that can be anchored to the genetic map including: 1) increased resolution of the genetic linkage map and improved marker order by gathering existing data from other projects to create a new consensus map, 2) an estimated 20X sequencing of chromosome arms to enable allocation of sequenced BACs and genome shotgun to arms, 3) utilization of low-coverage sequencing to assign some 10,000's of sequence strings within sequenced BACs to low-resolution map bins, 4) cross-sharing of information between this project and European barley genome initiatives. Another major change was to extend the project to 4 years via no-cost extensions. The extension enabled much more data collection than was originally planned, orderly databasing beyond the first two years, further sharing and release, and the preparation of publications after the first two years. What opportunities for training and professional development has the project provided? The following trainees and brief descriptions of their training have been part of this project: Matthew Alpert. UC Riverside, Undergraduate, then MSc student in Computer Sciences, genome sequence assembly and assembly analysis. Denisa Duma. UC Riverside, PhD student in Computer Sciences, from Romania, data simulations, algorithm development for combinatorial sequencing including error correction. Burair Alsaihati. UC Riverside, MSc student in Computer Sciences, from Saudi Arabia. Data simulations, algorithm development for combinatorial sequencing. Seyed Mirebrahim. UC Riverside, MSc student in Computer Sciences, from Iran. Comparisons of BAC assemblies from combinatorial sequencing with BACs sequenced by other methods. Rachid Ounit. UC Riverside, PhD student in Computer Sciences, from France. Algorithm to define chromosome-, arm- and centromere-specific k-mers and apply them to sequenced BACs for allocation of BACs to each k-mer source. Francesca Cordero. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Marco Beccuti. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Yaqin Ma. UC Riverside, Visiting Research Scientist, from China. Preparation of libraries for sequencing. Heather Roberson. UC Riverside, Laboratory Assistant. General wet lab support. Maria Munoz. U Minnesota, post-doc. New SNP consensus map. Matthew Moscou. Sainsbury Laboratory, post-doc. New SNP consensus map. Qihui Zhu. U Georgia, post-doc. Analysis of hypomethylated partially restricted DNA. How have the results been disseminated to communities of interest? In addition to dissemination actions listed under "Products" and "Other Products", members of our group have made oral and poster presentations at several scientific meetings and through research seminars at a number of locations during the reporting period. One example is an oral presentation by PhD student Denisa Duma, "Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space", at the ISMB/ECCB meeting July 18-23, 2013 in Berlin, Germany. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact of this project. Barley has been a favorite target of geneticists for decades and is still a major crop in the USA where it is used for animal feed, as a source of malt for the brewing industry, as a food ingredient for human consumption and as a biofuel feedstock. Its long history of genetic studies has resulted in the collection and characterization of thousands of mutants and genetic maps containing an aggregate of some 10,000s of molecular markers. Over 27,000 accessions of Hordeum vulgare ssp. vulgare and accessions from 30 other Hordeum species are available at the National Small Grains Collection at Aberdeen, ID. Barley geneticists have mapped loci for over 150 quantitative phenotypes, including agronomic performance, resistance to biotic and abiotic stresses, malting quality traits, and physiological and morphological traits. The genetic maps,mutants, germplasm resources and QTL provide excellent resources that are being exploited by barley breeders using technologies based on genome knowledge. To support this progression, The International Barley Genome Consortium (IBSC) has established near and long-term priorities for barley genomics (http://barleygenome.org). The IBSC includes members in the US, Germany, UK, Finland, Australia, Japan, China and Denmark. The short-term priority was to sequence all barley genes, a feat that was largely accomplished and summarized in our paper published in the journal Nature in 2012. The longer-term priority is a complete genome sequence, with an interim goal being a physical map anchored to the genetic map. This USDA-NIFA project has become the largest source of direct information supporting the goal of anchoring the physical to the genetic map by virtue of providing sequences of gene-enriched fragments of the barley genome. In essence, this project has built upon existing resources developed from several USDA and NSF projects led by the investigators, in coordination with the IBSC, to improve the knowledge of where genes are positioned within the barley genome. This project has resolved the positions of about 70% of all barley genes through direct evidence obtained by sequencing large fragments of the barley genome, and has provided access to this information prior to such information becoming available from any other source. Specific Objectives. Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. 1A. Lower copy fraction of the genome. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA was replaced by shotgun whole-genome sequencing using Illumina methods. During years 2 and 3 we accumulated DNA sequences from several Illumina runs, then generated a series of assemblies and provided public BLAST access to each new version through www.harvest-blast.org and download through www.harvest-web.org. As of December 2011, 164 Gb of adaptor- and quality-trimmed sequences (31X genome coverage) were assembled using SOAPdenovo to produce "Barley Genome 0.05". Analysis of k-mer frequencies showed that the assembly was strongly biased toward low-copy sequences, as hoped. The v.0.05 assembly size of 1.19 Gb represents only 22% of the total 5.3 Gb genome, but by BLAST includes >90% of all previously identified barley gene sequences. This gene enrichment method was very successful. 1B. Hypomethylated partially restricted DNA. Morex nuclear DNA was partially digested with two methylsensitive restriction endonucleases, then the smaller fragments were isolated and sequenced using Illumina methods. At the time of this report, analysis of the results had not been performed (U Georgia coPI Bennetzen). 1C. Diverse cDNA libraries. The plan for one full run of Roche 454 sequencing to generate 0.5 Gb of sequence from two Morex cDNA libraries was upgraded to one lane of Illumina 2x75, providing ~3 Gb of sequence data from a total of five multiplexed Morex cDNA libraries. The sequences were assembled using Velvet/Oases and posted on www.harvest-blast.org and www.harvest-web.org in Summer 2011 for public access. The assembly contains sequences from 70.3% of all previously known barley transcripts. This gene enrichment method was reasonably successful and helped with gene annotation, but added very few sequences that were not included in the genome shotgun assembly. This transcriptome dataset was subsumed within a larger transcriptome sequencing effort that was included in our (IBSC) Nature 2012 publication. Objective Set 2: Sequence gene-bearing BACs from an existing genomewide minimal tiling path. The initial confinement to ~2000 previously mapped BACs using both Roche 454 and Illumina sequencing was replaced by sequencing the entire gene-bearing MTP. This amounted to approximately 15,600 BACs, organized into 714 combinatorial pools. As of November 2013, all 714 BAC pool libraries had been sequenced at least once, and about 200 pools had been replaced or sequenced a second time to compensate for technical problems associated mainly with the new Illumina flow cells that became available in 2011. Sequence deconvolution algorithms underwent major improvements during year 4 to: improve the deconvolution of reads to single BACs, increase the average length of assembled sequences to about 25kb, and increase the percent coverage of BACs in assembled sequences to over 90%. Key improvements were to slice the datasets into smaller pieces for deconvolution then join the results of each slice (headed by UCR co-PI Lonardi), and the introduction of an error correction method that takes advantage of the pooling design (by UCR PhD student Denisa Duma). Another improvement was to define k-mers specific to each flow-sorted chromosome arm and use them to allocate each sequenced BAC with very high confidence to a specific arm (by UCR PhD student Rachid Ounit). Another improvement was to implement an empirical program to identify the critical depth of coverage for each BAC, after we observed that excess depth of coverage causes inflation of the BAC assembly, and the critical depth of coverage above with inflation becomes an issue is highly dependent on the overall quality of the data ("less is more"). All of these improvements have involved completely new innovations in algorithms, each of which is heading toward its own publication, but each improvement has set into motion a series of iterations that as of November 2013 have not yet reached a logical stopping point.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2012
Citation:
International Barley Sequencing Consortium, Mayer, K.F., Waugh, R., Brown, J.W., Schulman, A., Langridge, P., Platzer,
M., Fincher, G.B., Muehlbauer, G.J., Sato, K., Close, T.J., Wise, R.P., Stein, N. 2012. A physical, genetic and functional
sequence assembly of the barley genome. Nature 491:711-716.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Lonardi,S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P.R., Wu, Y., Ciardo, G., Alsaihati, B., Ma, Y.,
Wanamaker, S., Resnik, J., Bozdag, S., Luo, M-C., Close, T.J. 2013. Combinatorial pooling enables selective sequencing
of the barley gene space. PLOS Computational Biology 9:e1003010.
|
Progress 09/01/12 to 08/31/13
Outputs Target Audience: Barley researchers and others may access, download and use data, maps and assemblies generated by this project through freely accessible portals including http://harvest.ucr.edu, www.harvest-blast.org, www.harvest-web.org, http://triticeaetoolbox.org, http://trace.ncbi.nlm.nih.gov/Traces/sra, and through other portals posted on www.barley-genome.org, which is the website of the International Barley Sequencing Consortium. Changes/Problems: The original proposal for this project ended with a list of alternative, highly optimistic objectives ("Plan B"), looking ahead to reduced sequencing costs and possible opportunities to leverage other work. The review panel looked favorably on adjustments of the work plan to take advantage of such developments. Sequencing costs in fact did decline significantly. Consequently, all plans were adjusted upward in year 2. The major adjustments were: 1) the low copy portion of the genome was identified by shotgun sequencing instead of high Cot fractionation, 2) the transcriptome was sequenced more deeply and from more libraries than originally planned, 3) hypomethylated fragments were sequenced more deeply, 4) the complete set of about 15,600 gene-bearing minimal tiling path BACs was been included in the sequencing instead of only about 2,000 previously mapped BACs. Additional enhancements were to maximize the number of sequenced BACs and sequence scaffolds that can be anchored to the genetic map including: 1) increased resolution of the genetic linkage map and improved marker order by gathering existing data from other projects to create a new consensus map, 2) an estimated 20X sequencing of chromosome arms to enable allocation of sequenced BACs and genome shotgun to arms, 3) utilization of low-coverage sequencing to assign some 10,000’s of sequence strings to low-resolution map bins, 4) cross-sharing of information between this project and European barley genome initiatives. Another major change is to extend the project to 4.5 years via no-cost extensions. This extension will enable completion of all data collection, orderly databasing, further sharing and release, and the preparation of publications during the final few months of the fully extended period. What opportunities for training and professional development has the project provided? The following trainees and brief descriptions of their training have been part of this project: Matthew Alpert. UC Riverside, Undergraduate, then MSc student in Computer Sciences, genome sequence assembly and assembly analysis. Denisa Duma. UC Riverside, PhD student in Computer Sciences, from Romania, data simulations, algorithm development for combinatorial sequencing including error correction. Burair Alsaihati. UC Riverside, MSc student in Computer Sciences, from Saudi Arabia. Data simulations, algorithm development for combinatorial sequencing. Seyed Mirebrahim. UC Riverside, MSc student in Computer Sciences, from Iran. Comparisons of BAC assemblies from combinatorial sequencing with BACs sequenced by other methods. Rachid Ounit. UC Riverside, PhD student in Computer Sciences, from France. Algorithm to define chromosome-, arm- and centromere-specific k-mers and apply them to sequenced BACs for allocation of BACs to each k-mer source. Francesca Cordero. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Marco Beccuti. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Yaqin Ma. UC Riverside, Visiting Research Scientist, from China. Preparation of libraries for sequencing. Heather Roberson. UC Riverside, Laboratory Assistant. General wet lab support. Maria Munoz. U Minnesota, post-doc. New SNP consensus map. Matthew Moscou. Sainsbury Laboratory, post-doc. New SNP consensus map. Qihui Zhu. U Georgia, post-doc. Analysis of hypomethylated partially restricted DNA. How have the results been disseminated to communities of interest? In addition to dissemination actions listed under "Products" and "Other Products", members of our group have made oral and poster presentations at several scientific meetings and through research seminars at a number of locations during the reporting period. One example is an oral presentation by PhD student Denisa Duma, "Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space", at the ISMB/ECCB meeting July 18-23, 2013 in Berlin, Germany. What do you plan to do during the next reporting period to accomplish the goals? The purpose of the requested next period is to provide time to: 1) perform another round of optimization of Morex barley BAC assemblies using a sequence error correction algorithm recently developed for our combinatorial sequencing method by one of our PhD students, 2) complete the annotation and databasing of newly optimized BAC sequences, 3) deposit all BAC sequences from this project to NCBI and share them with the International Barley Sequencing Consortium, 4) provide public access to the newly optimized assemblies through harvest-blast.org and harvest-web.org, 5) finish the preparation of at least one additional publication on the work funded by this project and pay for at least one more Open Access publication on the work, (6) cover some of the expenses for the Annual Plant & Animal Genome meeting, which will include a business meeting of the IBSC, where a summary of the status of this project will be presented.
Impacts What was accomplished under these goals?
Impact of this project. Barley has been a favorite target of geneticists for decades and is still a major crop in the USA where it is used for animal feed, as a source of malt for the brewing industry, as a food ingredient for human consumption and as a biofuel feedstock. Its long history of genetic studies has resulted in the collection and characterization of thousands of mutants and genetic maps containing an aggregate of some 10,000s of molecular markers. Over 27,000 accessions of Hordeum vulgare ssp. vulgare and accessions from 30 other Hordeum species are available at the National Small Grains Collection at Aberdeen, ID. Barley geneticists have mapped loci for over 100 quantitative phenotypes, including agronomic performance, resistance to biotic and abiotic stresses, malting quality traits, and physiological and morphological traits. The genetic maps, mutants, germplasm resources and QTL provide excellent resources that are being exploited by barley breeders using technologies based on genome knowledge. To support this progression, The International Barley Genome Consortium (IBSC) has established near and long-term priorities for barley genomics (http://barleygenome.org). The IBSC includes members in the US, Germany, UK, Finland, Australia, Japan, China and Denmark. The short-term priority was to sequence all barley genes, a feat that was largely accomplished and summarized in our paper published in the journal Nature in 2012. The longer-term priority is a complete genome sequence, with an interim goal being a physical map anchored to the genetic map. This USDA-NIFA project has become the largest source of direct information supporting the goal of anchoring the physical to the genetic map by virtue of providing sequences of gene-enriched fragments of the barley genome. In essence, this project has built upon existing resources developed from several USDA and NSF projects led by the investigators, in coordination with the IBSC, to improve the knowledge of where genes are positioned within the barley genome. This project has resolved the positions of about 70% of all barley genes through direct evidence obtained by sequencing large fragments of the barley genome, and has provided access to this information prior to such information becoming available from any other source. Specific Objectives. Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. 1A. Lower copy fraction of the genome. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA was replaced by shotgun whole-genome sequencing using Illumina methods. During years 2 and 3 we accumulated DNA sequences from several Illumina runs, then generated a series of assemblies and provided public BLAST access to each new version through www.harvest-blast.org and download through www.harvest-web.org. As of December 2011, 164 Gb of adaptor- and quality-trimmed sequences (31X genome coverage) were assembled using SOAPdenovo to produce "Barley Genome 0.05". Analysis of k-mer frequencies showed that the assembly was strongly biased toward low-copy sequences, as hoped. The v.0.05 assembly size of 1.19 Gb represents only 22% of the total 5.3 Gb genome, but by BLAST includes >90% of all previously identified barley gene sequences. This gene enrichment method was very successful. 1B. Hypomethylated partially restricted DNA. Morex nuclear DNA was partially digested with two methyl-sensitive restriction endonucleases, then the smaller fragments were isolated and sequenced using Illumina methods. At the time of this report, analysis of the results was still not performed. 1C. Diverse cDNA libraries. The plan for one full run of Roche 454 sequencing to generate 0.5 Gb of sequence from two Morex cDNA libraries was upgraded to one lane of Illumina 2x75, providing ~3 Gb of sequence data from a total of five multiplexed Morex cDNA libraries. The sequences were assembled using Velvet/Oases and posted on www.harvest-blast.org and www.harvest-web.org in Summer 2011 for public access. The assembly contains sequences from 70.3% of all previously known barley transcripts. This gene enrichment method was reasonably successful and helped with gene annotation, but added very few sequences that were not included in the genome shotgun assembly. This transcriptome dataset was subsumed within a larger transcriptome sequencing effort that was included in our (IBSC) Nature 2012 publication. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path. The initial confinement to ~2000 previously mapped BACs using both Roche 454 and Illumina sequencing was replaced by sequencing the entire gene-bearing MTP. This amounted to approximately 15,600 BACs, organized into 714 combinatorial pools. As of November 2013, all 714 BAC pool libraries had been sequenced at least once, and about 200 pools had been replaced or sequenced a second time to compensate for technical problems associated mainly with the new Illumina flow cells that became available in 2011. Sequence deconvolution algorithms underwent major improvements during year 4 to: improve the deconvolution of reads to single BACs, increase the average length of assembled sequences to about 30kb, and increase the percent coverage of BACs in assembled sequences to over 90%. Key improvements were to slice the datasets into smaller pieces for deconvolution then join the results of each slice (headed by co-PI Lonardi), and the introduction of an error correction method that takes advantage of the pooling design(by PhD student Denisa Duma). Another improvement was to define k-mers specific to each flow-sorted chromosome arm and use them to allocate each sequenced BAC with very high confidence to a specific arm (by PhD student Rachid Ounit). Another improvement was to implement an empirical program to identify the critical depth of coverage for each BAC, after we observed that excess depth of coverage causes inflation of the BAC assembly, and the critical depth of coverage above with inflation becomes an issue is highly dependent on the overall quality of the data (“less is more”). All of these improvements have involved completely new innovations in algorithms, each of which is heading toward its own publication, but each improvement has set into motion a series of iterations that as of November 2013 have not yet reached a logical stopping point.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2012
Citation:
International Barley Sequencing Consortium, Mayer, K.F., Waugh, R., Brown, J.W., Schulman, A., Langridge, P., Platzer, M., Fincher, G.B., Muehlbauer, G.J., Sato, K., Close, T.J., Wise, R.P., Stein, N. 2012. A physical, genetic and functional sequence assembly of the barley genome. Nature 491:711-716.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Lonardi,S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P.R., Wu, Y., Ciardo, G., Alsaihati, B., Ma, Y., Wanamaker, S., Resnik, J., Bozdag, S., Luo, M-C., Close, T.J. 2013. Combinatorial pooling enables selective sequencing of the barley gene space. PLOS Computational Biology 9:e1003010.
|
Progress 09/01/11 to 08/31/12
Outputs OUTPUTS: Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. 1A. Lower copy fraction of the genome. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA was replaced by shotgun whole-genome sequencing using Illumina methods. During years 2 and 3 we accumulated DNA sequences from several Illumina runs, then generated a series of assemblies and provided public BLAST access to each new version through www.harvest-blast.org and download through www.harvest-web.org. As of August 2012, 164 Gb of adaptor- and quality-trimmed sequences (31X genome coverage) were assembled using SOAPdenovo to produce "Barley Genome 0.05". Analysis of k-mer frequencies showed that the assembly was strongly biased toward low-copy sequences, as hoped. The v.0.05 assembly size of 1.19 Gb represents only 22% of the total 5.3 Gb genome, but by BLAST includes >90% of all previously identified barley gene sequences. This gene enrichment method was very successful. 1B. Hypomethylated partially restricted DNA. Morex nuclear DNA was partially digested with two methyl-sensitive restriction endonucleases, then the smaller fragments were isolated and sequenced using Illumina methods. At the time of this report, analysis of the results was still not available. 1C. Diverse cDNA libraries. The plan for one full run of Roche 454 sequencing to generate 0.5 Gb of sequence from two Morex cDNA libraries was upgraded to one lane of Illumina 2x75, providing ~3 Gb of sequence data from a total of five multiplexed Morex cDNA libraries. The sequences were assembled using Velvet/Oases and posted on www.harvest-blast.org and www.harvest-web.org for public access. The assembly contains sequences from 70.3% of all previously known barley transcripts. This gene enrichment method was reasonably successful and is helping with gene annotation, but added very few sequences that were not included in the genome shotgun assembly. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path. The initial confinement to 2197 previously mapped BACs using both Roche 454 and Illumina sequencing was replaced by sequencing the entire gene-bearing MTP. This was approximately 14,600 BACs, organized into 637 combinatorial pools. As of August 2012, all 637 BAC pool libraries had been sequenced at least once, and about 200 pools had been replaced or sequenced a second time to compensate for technical problems associated mainly with the new Illumina flow cells that became available in 2011. Sequence deconvolution algorithms were modified during year 3 to speed the processing time while also enabling incorporation of lower copy repetitive sequences into the individual BAC assemblies. Flow sorted arms 2HS through 7HL and entire chromosome 1H were also sequenced at a depth of coverage of 10-20X to provide a reference dataset for assignment of BACs to arm positions, with raw data released through NCBI Sequence Read Archive. PARTICIPANTS: Timothy Close. UC Riverside, Professor and Geneticist, Project Director, involved in all aspects of the project. Stefano Lonardi. UC Riverside, Professor, Computer Sciences, Co-Project Director, in charge of algorithm development and advanced programming. Matthew Alpert. UC Riverside, Undergraduate student in Computer Sciences, genome sequence assembly and assembly analysis. Steve Wanamaker. UC Riverside, Programmer, DNA sequence processing, transcriptome sequence assembly, HarvEST:Barley relational database, BLAST server, systems administrator. Denisa Duma. UC Riverside, PhD student in Computer Sciences, from Romania, data simulations, algorithm development for combinatorial sequencing, gene annotations. Burair Alsaihati. UC Riverside, MS student in Computer Sciences, from Saudi Arabia. Data simulations, algorithm development for combinatorial sequencing. Gianfranco Ciardo. UC Riverside, Professor, Computer Sciences. Algorithm development for combinatorial sequencing. Francesca Cordero. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Marco Beccuti. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Raymond Fenton. UC Riverside, Staff Research Associate. Wet lab support for BAC and genomic DNA sequencing, gene-BAC deconvolution. Yaqin Ma. UC Riverside, Visiting Research Scientist, from China. Preparation of libraries for sequencing. Heather Roberson. UC Riverside, Laboratory Assistant. General wet lab support. Maria Munoz. U Minnesota, post-doc. New SNP consensus map. Gary Muehlbauer. U Minnesota, Professor, co-Project Director. Assistant with manuscript on new SNP consensus map, interactions with GrainGenes for community access to results. Matthew Moscou. Sainsbury Laboratory. New SNP consensus map. Jeffrey Bennetzen. U Georgia, Professor, co-Project Director. Oversight of gene enrichment using hypomethylated partially restricted DNA. Qihui Zhu. U Georgia. Analysis of hypomethylated partially restricted DNA. Hao Wang. U Georgia. Analysis of hypomethylated partially restricted DNA. Roger Wise. USDA and Iowa State University. Spot checking of sequence assemblies, dialog about transcriptome. Nils Stein. Leibniz Institute, Gatersleben. Discussion of unification of European data with results of this project. International Barley Sequencing Consortium. Exchange of information regarding new project goals, progress, data sharing and coordinated publications. Vicki Carollo-Blake, USDA Albany, GrainGenes. Posting of new consensus map and other barley genome information on the GrainGenes website and in the TriticeaeCAP data portal. TARGET AUDIENCES: Barley researchers and others may access, download and use data, maps and assemblies generated by this project through freely accessible portals including http://harvest.ucr.edu, www.harvest-blast.org, www.harvest-web.org, http://triticeaetoolbox.org, http://trace.ncbi.nlm.nih.gov/Traces/sra, and through other portals posted on www.barley-genome.org, which is the website of the International Barley Sequencing Consortium. PROJECT MODIFICATIONS: The original proposal for this project ended with a list of alternative, more optimistic objectives ("Plan B"), looking ahead to reduced sequencing costs and possible opportunities to leverage other work. The review panel looked favorably on adjustments of the work plan to take advantage of such developments. Sequencing costs in fact did decline significantly. Consequently, all plans were adjusted upward in year 2. The major adjustments were: 1) the low copy portion of the genome was identified by shotgun sequencing instead of high Cot fractionation, 2) the transcriptome was sequenced more deeply and from more libraries than originally planned, 3) hypomethylated fragments were sequenced more deeply, 4) the complete set of about 14,600 gene-bearing minimal tiling path BACs has been included in the sequencing instead of only about 2,000 previously mapped BACs. Additional enhancements were to maximize the number of sequenced BACs and sequence scaffolds that can be anchored to the genetic map including: 1) increased resolution of the genetic linkage map and improved marker order by gathering existing data from other projects to create a new consensus map, 2) an estimated 20X sequencing of chromosome arms to enable allocation of sequenced BACs and genome shotgun to arms, 3) utilization of sequence tag map bin assignments of collaborator Jesse Poland to assign many BACs and sequence scaffolds to map bins, and 4) cross-sharing of information between this project and European barley genome initiatives. Another major change is to extend the project to four years via no-cost extensions. This extension will enable completion of all data collection, orderly databasing, further sharing and release, and the preparation of publications during year four.
Impacts Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. The objective of identifying >95% of all Morex barley genes using three gene enrichment approaches seems to have been accomplished. This information, along with parallel and in some ways more extensive information generated by others in the IBSC and included in the IBSC Nature paper, is assisting with gene annotation on BACs and the genome in general. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path. The initial objective of increasing the knowledge of gene content of 2000 previously mapped BACs has been exceeded by far. The outcome of the work in this project has shaped up to be about 2/3 of all expressed genes allocated to sequenced BACs, about 90% of all sequenced BACs assigned at least to a chromosome arm, and most BACs resolved to a genetic map position. Because all of our work has been on Morex barley, all of this information can be merged, and much of it has been merged, into the Morex sequence database led by IBSC partners in Germany. The outcome of the joined effort is an information resource that greatly facilitates map-based cloning and marker development for breeding, available through web interfaces provided by MIPS or our HarvEST portals and software. The outcome of this BAC sequencing objective in our project is much more comprehensive than was originally anticipated when the project was proposed, to the extent that despite very large commitments to physical mapping and BAC sequencing in Europe, this project is now the leading primary source of barley BAC sequences. Objective Set 3: Assemble, annotate, release data for public access. As stated above, the genome shotgun and transcriptome assemblies were produced and made publicly available through www.harvest-blast.org and www.harvest-web.org. In addition, the availability of genome assembly version 0.05 was announced on the GrainGenes website and through a brief publication in the Barley Genetics Newsletter in 2011. All of the shotgun genome sequences, transcriptome sequences, and assemblies from these and the first set of 2190 BACs were provided to International Barley Sequencing Consortium partners in 2011 with the intention of including all of this information in a publication jointly prepared by the community. Raw data was also released through the NCBI Sequence Read Archive. The BAC assemblies were integrated into an IBSC manuscript that will appear in Nature in late-2012. As of August 2012, genome assembly version 0.05, unmasked and masked for highly repetitive sequences, had been provided to 45 investigators who requested it using the harvest-web.org interface. For the sake of simplifying information release agreements, cross referencing of this assembly to an alternative Morex shotgun genome assembly included in the IBSC Nature paper is awaiting full public release of the information present in the Nature paper.
Publications
- Alpert M, Wanamaker S, Duma D, Fenton RD, Ma Y, Muehlbauer GJ, Lonardi S, Close TJ. 2011. A genome sequence resource for barley. Barley Genetics Newsletter 41:10-11.
- Close TJ, Alpert M, Lonardi S, Duma D, Wanamaker S. 2011. Barley Genome 0.05. wwww.harvest-blast.org and www.harvest-web.org
- Close TJ, Wanamaker S. 2012. HarvEST:Barley 1.84-1.87. http://harvest.ucr.edu, wwww.harvest-web.org
- Lonardi S, Duma D, Alpert M, Cordero F, Beccuti M, Bhat PR, Wu Y, Ciardo G, Alsaihati B, Ma Y, Wanamaker S, Resnik J, Close TJ. 2011. Barcoding-free BAC pooling enables combinatorial selective sequencing of the barley gene space. arXiv:1112.4438v1.
- International Barley Sequencing Consortium. 2012. A physical, genetical and functional sequence assembly of the barley genome. Nature (accepted, in process).
- Munoz-Amatriain M, Moscou MJ, Bhat PR, Svensson JT, Bartos J, Suchankova P, Simkova H Endo TR, Fenton RD, Lonardi S, Castillo AM, Chao S, Cistue L, Cuesta-Marcos A, Forrest KL, Hayden MJ, Hayes PM, Horsley RD, Makato K, Moody D, Sato K, Valles MP, Wulff BB, Muehlbauer GJ, Dolezel J, Close TJ. 2011. An improved consensus linkage map of barley based on flow-sorted chromosomes and SNP markers. The Plant Genome: 238-249.
|
Progress 09/01/10 to 08/31/11
Outputs OUTPUTS: Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. 1A. Lower copy fraction of the genome. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA was replaced by shotgun whole-genome sequencing using Illumina methods. During year 2 we accumulated DNA sequences from several Illumina runs, then generated a series of assemblies and provided public access to each new version through www.harvest-blast.org. As of August 2011, 125 Gb of adaptor- and quality-trimmed sequences (23X genome coverage) were assembled using SOAPdenovo to produce "Barley Genome 0.04". Analysis of k-mer frequencies showed that the assembly was strongly biased toward low-copy sequences, as we had hoped. The v.0.04 assembly size of 1.13 Gb represents only 21% of the total 5.3 Gb genome, but by BLAST includes 89.5% of all previously identified barley gene sequences. This gene enrichment method was very successful. 1B. Hypomethylated partially restricted DNA. Morex nuclear DNA was partially digested with two methyl-sensitive restriction endonucleases, then the smaller fragments were isolated and sequenced using Illumina methods. At the time of this report, analysis of the results was still in progress. 1C. Diverse cDNA libraries. The plan for one full run of Roche 454 sequencing to generate 0.5 Gb of sequence from two Morex cDNA libraries was upgraded to one lane of Illumina 2x75, providing ~3 Gb of sequence data from a total of five multiplexed Morex cDNA libraries. The sequences were assembled using Velvet/Oases and posted on www.harvest-blast.org for public access. The assembly contains sequences from 70.3% of all previously known barley transcripts. This gene enrichment method was reasonably successful and will help with gene annotation, but added very few sequences that were not included in the genome shotgun assembly; the union set was 89.8% of all barley genes. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path. The initial confinement to 2197 previously mapped BACs using both Roche 454 and Illumina sequencing was replaced by sequencing the entire gene-bearing MTP. This was approximately 14,600 BACs in 637 combinatorial pools. As of August 2011, all 637 BAC pool libraries had been prepared, and all but 39 of these were sequenced using Illumina methods, with the final 39 in the sequencer queue for September 2011. About 15% of these libraries had lower than desired sequencing yields and were in the process of being repeated. PARTICIPANTS: Timothy Close. UC Riverside, Professor and Geneticist, Project Director, involved in all aspects of the project. Stefano Lonardi. UC Riverside, Professor, Computer Sciences, Co-Project Director, in charge of algorithm development and advanced programming. Matthew Alpert. UC Riverside, Undergraduate student in Computer Sciences, genome sequence assembly and assembly analysis. Steve Wanamaker. UC Riverside, Programmer, DNA sequence processing, transcriptome sequence assembly, HarvEST:Barley relational database, BLAST server, systems administrator. Denisa Duma. UC Riverside, PhD student in Computer Sciences, from Romania, data simulations, algorithm development for combinatorial sequencing, gene annotations. Burair Alsaihati. UC Riverside, MS student in Computer Sciences, from Saudi Arabia. Data simulations, algorithm development for combinatorial sequencing. Gianfranco Ciardo. UC Riverside, Professor, Computer Sciences. Algorithm development for combinatorial sequencing. Francesca Cordero. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Marco Beccuti. UC Riverside visiting PhD scientist from University of Torino, Italy, Computer Sciences. Algorithm development for combinatorial sequencing. Raymond Fenton. UC Riverside, Staff Research Associate. Wet lab support for BAC and genomic DNA sequencing, gene-BAC deconvolution. Yaqin Ma. UC Riverside, PhD Research Scientist, from China. Preparation of libraries for sequencing. Heather Roberson. UC Riverside, Laboratory Assistant. General wet lab support. Maria Munoz. U Minnesota, post-doc. New SNP consensus map. Gary Muehlbauer. U Minnesota, Professor, co-Project Director. Assistance with manuscript on new SNP consensus map, interactions with GrainGenes for community access to results. Matthew Moscou. Collaborator, Sainsbury Laboratory, Norwich, UK. New SNP consensus map. Jeffrey Bennetzen. U Georgia, Professor, co-Project Director. Oversight of gene enrichment using hypomethylated partially restricted DNA. Qihui Zhu and Hao Wang, U Georgia. Production and analysis of hypomethylated partially restricted DNA. Roger Wise. Collaborator, USDA and Iowa State University, Ames, Iowa. Spot checking of sequence assemblies, dialog about transcriptome. Nils Stein. Leibniz Institute for Plant Genetics and Crops Research (IPK), Gatersleben, Germany. Discussion of unification of European data with results of this project. International Barley Sequencing Consortium. Exchange of information regarding new project goals, progress, data sharing and coordinated publications. TARGET AUDIENCES: Barley researchers and others may access, download and use data, maps and assemblies generated by this project through public portals initially including http://harvest.ucr.edu, www.harvest-web.org and www.harvest-blast.org, and subsequently "The Triticeae Toolbox" (http://triticeaetoolbox.org/) and other URLs posted on www.barley-genome.org which is the website of the International Barley Sequencing Consortium. PROJECT MODIFICATIONS: The original proposal for this project ended with a list of alternative, more optimistic objectives ("Plan B"), looking ahead to reduced sequencing costs and possible opportunities to leverage other work. The review panel looked favorably on adjustments of the work plan to take advantage of such developments. Sequencing costs in fact did decline significantly. Consequently, all plans were adjusted upward in year 2. The major upward adjustments were as follows: 1) the low copy portion of the genome was identified by shotgun sequencing instead of high Cot fractionation, 2) the transcriptome was sequenced more deeply and from more libraries than originally planned, 3) hypomethylated fragments were sequenced more deeply, 4) the complete set of about 14,600 gene-bearing minimal tiling path BACs is being sequenced instead of only about 2,000 previously mapped BACs. In addition, other enhancements were added to maximize the number of sequenced BACs and sequence scaffolds that can be anchored to the genetic map including: 1) increased resolution of the genetic linkage map and improved marker order by gathering existing data from other projects to create a new consensus map, 2) an estimated 20X sequencing of chromosome arms to enable allocation of sequenced BACs and genome shotgun to arms, and 3) utilization of sequence tag map bin assignments of collaborator Jesse Poland to assign many BACs and sequence scaffolds to map bins. Another major change was to extend the project to three years via a no-cost extension. This extension will enable completion of all data collection, orderly databasing and the preparation of publications during year three.
Impacts Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. The objective of identifying >95% of all Morex barley genes using three gene enrichment approaches appears to be on track. This information will assist in gene annotation on BACs and in the genome in general. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path. The initial objective of increasing the knowledge of gene content of 2000 previously mapped BACs has been far exceeded. The outcome is shaping up to be in the range of 2/3 of all genes allocated to BACs, all BACs assigned at least to a chromosome arm, and most BACs resolved to a map position. This information will assist in map-based cloning and marker development for breeding, meeting these objectives much more fully than was originally anticipated when the project was proposed. Objective Set 3: Assemble, annotate, release data for public access. As stated above, genome shotgun and transcriptome assemblies were produced and made publicly available through www.harvest-blast.org. Dialog has been underway with the GrainGenes team to port the new genetic map and sequence assemblies to "The Hordeum Toolbox". Partners in the International Barley Sequencing Consortium have been informed of developments, which will encourage unification, along with the anticipated full entry of the DOE Joint Genome Institute's Community Sequencing Program into the barley genome landscape in latter 2011 to early 2012. Various announcements through newsletters, meetings and community websites were in motion as of the date of this report, in addition to full disclosure of details through technical publications submitted or in preparation.
Publications
- Close TJ, Wanamaker S, Lonardi S, Alpert M, Muehlbauer GJ, Wise RM. 2010. Barley Transcriptome 0.02. wwww.harvest-blast.org
- Close TJ, Alpert M, Lonardi S, Duma D, Wanamaker S. 2010. Barley Genome 0.02. wwww.harvest-blast.org
- Close TJ, Alpert M, Lonardi S, Duma D, Wanamaker S. 2011. Barley Genome 0.03. wwww.harvest-blast.org
- Close TJ, Alpert M, Lonardi S, Duma D, Wanamaker S. 2011. Barley Genome 0.04. wwww.harvest-blast.org
- Close TJ, Wanamaker S. 2011. HarvEST:Barley 1.82. http://harvest.ucr.edu, wwww.harvest-web.org
|
Progress 09/01/09 to 08/31/10
Outputs OUTPUTS: Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA may be replaced by shotgun whole-genome sequencing. Close verified using SNP genotyping that the Morex stocks at UC Riverside, U Minnesota and U Iowa were genetically identical. The Close lab increased the supply to ~100,000 seeds and produced about 60 gram of tissue yielding ~160 microgram of nuclear DNA. ~30 Gb of shotgun Illumina sequence (~6X genome coverage) was generated as of mid-August. A decision on whether to pursue high Cot fractionation was on hold. There was no progress on hypomethylated partially restricted DNA. The plan for Roche 454 sequencing to generate 0.5 Gb of Morex transcript sequence data was upgraded to 4 Gb of Illumina sequence data. Five indexed cDNA libraries were produced and sequenced. Objective Set 2: Sequence gene-bearing BACs from an existing genome-wide minimal tiling path (MTP). The plan to run 13 pools of 169 BACs using Roche 454 was eliminated in favor of accomplishing more sequencing using the Illumina platform. The complete minimal tiling path of gene-bearing BACs is ~15,000 clones. Combinatorial pools of BACs were produced from all of the MTP clones, 7 sets of 91 pools of BACs for a total of 637 pools of BAC DNAs. The genetic map was improved by reanalyzing existing SNP data from four doubled haploid (DH) populations, including twice the number of DHs from one set of mapping parents, and utilizing existing SNP data from several additional DH populations. The published barley SNP map is composed of 2,943 SNP markers. Maria Munoz reanalyzed the four previously-used mapping populations and three additional DH mapping populations, with several still to analyze. We established using rice and barley data simulations that Illumina sequencing at 50X is adequate to assemble all of the expressed genes in a typical BAC. Sequences from a single-BAC-pool test run were compared to 116 barley EST-derived consensus sequences previously associated with BACs in this pool using Illumina GoldenGate assays; 112 of these genes (97%) were identified in the sequences generated in the Illumina test run. Objective Set 3: Assemble, annotate, release data for public access. The aims of this objective set have become more specific, now including agreements with the GrainGenes team to serve the information through "The Hordeum Toolbox" and with the National Center for Biotechnology Information, in addition to continued planning to release data to IBSC partners and their additional outlets. As part of the ongoing USDA-NIFA funded barley CAP effort, the THT development team has biweekly meetings to discuss THT progress. We have initiated recurrent conversations with the developers of The Hordeum Toolbox (THT), hosted by GrainGenes, to house the integrated SNP-based genetic map and the BAC physical map, along with BAC, transcript and genome sequences. PARTICIPANTS: Timothy J. Close (Professor, PI), Raymond D. Fenton (Staff Research Associate), Steve Wanamaker (Programmer), Yaqin Ma (Professional Researcher); Department of Botany & Plant Sciences, University of California, Riverside. Stefano Lonardi (Associate Professor, co-PI); Department of Computer Sciences, University of California, Riverside. Matthew Alpert. Undergraduate student in Computer Sciences, University of California, Riverside. Next generation sequencing, genome assembly, algorithm development. Denis Duma. PhD graduate student in Computer Sciences, University of California, Riverside. Next generation sequencing, genome assembly, algorithm development. Gary J Muehlbauer (Associate Professor, co-PI), Brian Steffenson (Professor, collaborator), Maria Munoz (post-doctoral researcher); University of Minnesota. Preparation of RNA from pathogen-inoculated tissue, genetic mapping. Roger Wise (collaborator). Preparation of RNA from pathogen-inoculated tissue. Jeff Bennetzen (Professor, co-PI). Planning discussions for gene-enrichment. Victoria Carollo, David Matthews. GrainGenes team for development of data portal. International Barley Genome Sequencing Consortium members. Frequent dialog and planning for data exchange. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Originally Stated Objectives. Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. This will assist with annotation of BACs and the genome in general. Sequence to be generated using Roche 454 FLX Titanium. Note: a "full run" means that a sample is applied to an entire PicoTiter plate using a single-region gasket; a "half run" means that a sample is applied to one region of the PicoTiter plate using a two-region gasket. 1A. High Cot fractions (2 full runs, 1 Gb total); Bennetzen 1B. Hypomethylated partially restricted DNA (4 full runs, 2 Gb total); Bennetzen 1C. Diverse cDNA library (1 full run, 0.5 Gb total); Close and Lonardi. Objective Set 2: Sequence ~2000 previously mapped gene-bearing BACs from an existing minimal tiling path, mainly from chromosomes 4H, 5H and 6H. Structured on a combinatorial design for 2197 BACs for gene-BAC deconvolution. Close and Lonardi. 2A. Run 13 pools of 169 clones (20 Mb per pool, 260 Mb in 2197 BACs). No tagging, gaps will preclude most gene-BAC deconvolution. Use 13 half runs of 454 (250 Mb per half run). 2B. Run 91 pools of 169 clones (20 Mb per pool, 260 Mb in 2197 BACs). Seven layers, 13 pools per layer provide 3-decodable gene-BAC deconvolution. Use 46 lanes of Illumina Genome Analyzer, paired end (1.0 Gb per lane) containing two tagged pools each. Objective Set 3: Assemble, annotate, release data for public access. All investigators. Updates on Objectives. The original proposal ended with a list of alternative, more optimistic objectives ("Plan B"), looking ahead to reduced sequencing costs and possible opportunities to leverage other work. The review panel looked favorably on adjustments of the work plan to take advantage of such developments. The essence of each objective remains the same for year 2 as originally proposed, but plans have been adjusted upward. Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. This will assist with annotation of BACs and the genome in general. 1A. High Cot fractions; Bennetzen. The plan to apply Roche 454 sequencing to generate 1 Gb of physically fractionated high Cot DNA may be replaced by shotgun whole-genome sequencing. 1C. Diverse cDNA library; Close and Lonardi. The plan for one full run of Roche 454 sequencing to generate 0.5 Gb of sequence data from two Morex cDNA libraries (one per half run) has been upgraded to one lane of Illumina, paired-end, 72 bases each end, to provide 4 Gb of sequence data from a total of five multiplexed Morex cDNA libraries; Close, Lonardi, Muehlbauer, and collaborator Roger Wise. 2A. The plan to run 13 pools of 169 clones (20 Mb per pool, 260 Mb in 2197 BACs) using 13 half runs of Roche 454 (250 Mb per run) has been eliminated in favor of accomplishing more sequencing under Objective 2B using the Illumina platform. 2B. Run pools of 169 clones (20 Mb per pool, 260 Mb in 2197 BACs). The plan to use 46 lanes of sequencing for a single set of 2197 BACs has been adjusted. At least three sets of 2197 BACs will be sequenced instead of a single set, and fewer lanes will be used for each set.
Impacts Objective Set 1: Determine the sequences of nearly all Morex barley genes, including 5' and 3' flanking regions. This will assist with annotation of BACs and the genome in general. Only ~68,000 Morex EST sequences were previously available; this is a total of ~40 Mb of Morex EST sequence data, accounting for ~1/3 of all genes identified by barley ESTs. Five indexed cDNA libraries yielded about 4 Gb of Morex transcript sequence, increasing the amount of Morex barley transcript sequence data ~100-fold. Objective Set 2: Sequence previously mapped gene-bearing BACs from an existing genome-wide minimal tiling path. The complete minimal tiling path of gene-bearing BACs is ~15,000 clones. We produced combinatorial pools of BACs from the entire set of MTP clones. If we can afford to sequence all of these BAC pools, then we estimate ~25,000 genes will be anchored to BACs. The resolution of the genetic linkage map has been increased approximately 2-fold, SNP marker order has been improved on a fine scale and the number of BACs anchored to the genetic map has increased in parallel. These improvements facilitate map-based cloning and nearby marker development. Objective Set 3: Assemble, annotate, release data for public access. THT via GrainGenes will be the primary portal for data access.
Publications
- No publications reported this period
|
|