Source: UTAH STATE UNIVERSITY submitted to
ASSEMBLY OF THE OVINE WHOLE GENOME REFERENCE SEQUENCE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0220808
Grant No.
2010-65205-20418
Project No.
UTA01002
Proposal No.
2009-03305
Multistate No.
(N/A)
Program Code
92120
Project Start Date
Jan 1, 2010
Project End Date
Dec 31, 2013
Grant Year
2010
Project Director
Cockett, N. E.
Recipient Organization
UTAH STATE UNIVERSITY
(N/A)
LOGAN,UT 84322
Performing Department
Animal Dairy & Veterinary Sciences
Non Technical Summary
This research project is directed towards the development of a high quality reference sequence assembly for sheep. At the conclusion of this project, a comprehensive reference sequence assembly will be generated using all available sequence data from a single reference animal. The resulting sequence and the consensus genome assembly will be available to the scientific community with unrestricted access through a designated portal. Substantial leveraging of funds and expertise from the International Sheep Genomics Consortium, combined with technological and computational advances in the area of de novo sequence assembly, will contribute significantly to the success of the project. The resulting ovine whole genome reference sequence will accelerate searches for genetic regions and genes influencing phenotypes in sheep, and combined with the bovine genome reference sequence, will serve as a backbone for other ruminant species. The assembly will also be a genomic resource for ovine biomedical research models.
Animal Health Component
(N/A)
Research Effort Categories
Basic
90%
Applied
10%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043610108050%
3043610104050%
Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3610 - Sheep, live animal;

Field Of Science
1080 - Genetics; 1040 - Molecular biology;
Goals / Objectives
Sequence data will be generated from a single reference animal using short paired-end read technologies on multiple high-throughput sequencing platforms. Assembly of the sequences will be iterative, starting with de-novo assembly into high quality contigs and continuing with information from syntenic alignments to the human, bovine, dog and horse whole genome sequences. Information from the ovine linkage, RH and cytogenetic maps and positional information from the ovine consensus genome assembly will also be utilized in the assembly of the first draft of the reference sequence. Problem areas will be finished by sequencing BAC clones that cover large gaps and array-based hybridization enrichment across short gaps. The resulting reference sequence will be equivalent to approximately 7X coverage Sanger sequencing, have an error rate of less than one base pair in 10,000, and have a minimal number of gaps.
Project Methods
In Step 1, whole genome sequence will be generated from libraries of varying insert size constructed from DNA of a single animal (Texel ram) and sequenced on the Illumina Genome Analyzer II (GAII). The GAII sequencing will generate 30X coverage of the genome in 50-75 bp reads from both ends of 200-500 bp fragments (i.e. paired-end reads) and of 2-5 Kb fragments (i.e. mate-pair reads). The use of different insert sizes is expected to increase the contiguity of the assembly by spanning across homopolymer runs, SINES, and LINES, thereby enabling the creation of longer scaffolds. Additional sequencing on the Roche 454 platform will be undertaken to generate 2X density of 200 base paired-end reads of 40 Kb inserts, producing 6 Gbp of sequence data. Both the GAII and 454 sequence data will be transferred to CSIRO and AgResearch for subsequent analyses. Initial assembly of the DNA sequence generated in Step 1 will be undertaken in Step 2 using a new version of the virtual sheep genome (vsg3.0) incorporating the latest linkage, RH and cytogenetic mapping data as the framework. The approach will be a "seeded assembly". Ovine sequences will be grouped into 1 Mbp bins and then these sequences and their paired ends assembled de novo using the Velvet algorithm (Zerbino and Birney, 2008). The resulting scaffolds will be merged across bins whenever possible. If required, a second round of assembly will include paired-end reads not included in the first round. In Step 3, gaps within scaffolds and problem regions (such as the MHC and T cell receptor regions) will be identified through comparisons back to other assembled mammalian genomes. Targeted sequencing of these regions will be generated with a single or pooled BAC tiling approach and possibly array-based hybridization enrichment to fill short gaps. In this step, BAC ends will be matched to assembly contigs and BAC clones spanning contigs will be sequenced using ABI SOLiD technology. These approaches should give the best possible chance for the gaps to be sized, sequenced and incorporated into the larger assembly. Targeted linkage and RH mapping in problematic regions of the assembly will be undertaken in Step 4 to identify the correct order of scaffolds. Markers for typing across the IMF population and the USUoRH5000 radiation hybrid panel will be designed from the reference and consensus sequences, as well as previously identified SNPs in the problem regions. The project also has access to other linkage analysis populations comprising ~5,000 informative meioses that could be used if necessary for positioning of problem regions. Any scaffold longer than one megabase that cannot be ordered and oriented on sheep chromosomes using RH or linkage mapping techniques will be cytogenetically anchored using fluorescence in-situ hybridization (FISH) to specific sheep chromosomes. Final comprehensive reference and consensus sequence assemblies will be developed in Step 5 using all available data generated through Steps 1-4.

Progress 01/01/13 to 12/31/13

Outputs
Target Audience: Researchers Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Dr. Jiang Yu was hired as a post-doctoral fellow on the project. By the last year of the project, Dr. Yu was the key contact for analyses and updates on the reference genome sequence. He is now a research scientist at the Beijing Genomics Institute (BGI). How have the results been disseminated to communities of interest? Outcomes of this project have been disseminated through the International Sheep Genomics Consortium (ISGC), which includes investigators from across the world who are engaged in research on livestock genomics. The ISGC achieved communication in a variety of ways, including bi-weekly conference calls, as well as face-to-face workshops at the annual Plant and Animal Genome (PAG) meeting and the bi-annual meetings of the International Society of Animal Genetics (ISAG). Presentations on the reference genome sequence and assembly were made by the principle investigators each year at the PAG and ISAG meetings. A list-server with over 75 consortium participants ensured communication through emails of key decisions, results and outcomes. Presentations were also made at the annual American Sheep Industry (ASI) meetings and an article on the reference genome assembly was included in the 2013 ASI magazine. In this way, US sheep producers were made aware of the research activities within the project. A manuscript describing the whole genome assembly (Oar v3.1), the RH map, and the linkage map is in preparation. Highlights of differences between the genome structure of sheep, cattle and goats are included in the manuscript. The analysis of about a terabite of data on the transcriptome is also included. Variation of alleles, allelic imbalance and copy number variation have been included in the manuscript as points of interest. Biological stories include reproduction, digestive tract enzymes, evolution of the rumen, lipid metabolism and evolution of wool. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. The first step of the reference sequence assembly involved de novo assembly of 75X reads from the Texel ewe into contigs and scaffolds. Once that was completed, sequences from both animals were used for gap filling. Version 2.0 of the ovine whole genome reference sequence (Oar v2.0) was publicly released in February, 2011 and Oar v3.1 was released in October, 2012 through NCBI GenBank. The ovine whole genome reference sequence is being used by researchers worldwide to explore genetic regions of interest. These regions likely contain genes and regulatory sequences that influence phenotypes in sheep. Thus, the reference assembly is contributing to improvement in the efficiency of research targeted towards sheep production as well as the use of sheep as a biomedical model. The reference genome assembly is now being annotated by Ensembl using an RNA dataset produced by Roslin Institute. In this way, genes and genetic regulation elements will be identified within the assembly. The RNA dataset produced by Roslin Institute is the largest transcriptome analysis of any species in Ensembl, including man. The next version of the assembly (Oar v4.0) will include the annotation and its release is expected by late 2015. Updated patches for some regions will likely be released before then. Kim Worley (BCM-HGSC) received funding from USDA/AFRI to fill gaps in the sheep assembly. Using technology and analyses developed within the project. About 89% of the gaps that existed within the assembly have been closed. This has resulted in a contig N50 of over 500 kb (previously at 41.7 kb). However, there was only a minor shift in scaffold N50 (from 100.1Mb to 101.2Mb).

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Dong, Y.M., Cockett, N., 3. c. (2013). A reference genome of the domestic goat (Capra hircus) generated by Illumina sequencing and whole genome mapping. Nature Biotechnology, 31, 135-41.
  • Type: Other Status: Published Year Published: 2012 Citation: Jiang, Y., Xie, M., Dalrymple, B.P., Kijas, J., Talbot, R., Archibald, A., Maddox, J.F., Faraut, T., Cockett, N. (2012). The domestic sheep reference genome assembly (pp. P1019). Cairns: Proc. 33rd International Society of Animal Genetics.


Progress 01/01/10 to 12/31/13

Outputs
Target Audience: Researchers Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Dr. Jiang Yu was hired as a post-doctoral fellow on the project. By the last year of the project, Dr. Yu was the key contact for analyses and updates on the reference genome sequence. He is now a research scientist at the Beijing Genomics Institute (BGI). How have the results been disseminated to communities of interest? Outcomes of this project have been disseminated through the International Sheep Genomics Consortium (ISGC), which includes investigators from across the world who are engaged in research on livestock genomics. The ISGC achieved communication in a variety of ways, including bi-weekly conference calls, as well as face-to-face workshops at the annual Plant and Animal Genome (PAG) meeting and the bi-annual meetings of the International Society of Animal Genetics (ISAG). Presentations on the reference genome sequence and assembly were made by the principle investigators each year at the PAG and ISAG meetings. A list-server with over 75 consortium participants ensured communication through emails of key decisions, results and outcomes. Presentations were also made at the annual American Sheep Industry (ASI) meetings and an article on the reference genome assembly was included in the 2013 ASI magazine. In this way, US sheep producers were made aware of the research activities within the project. A manuscript describing the whole genome assembly (Oar v3.1), the RH map, and the linkage map is in preparation. Highlights of differences between the genome structure of sheep, cattle and goats are included in the manuscript. The analysis of about a terabite of data on the transcriptome is also included. Variation of alleles, allelic imbalance and copy number variation have been included in the manuscript as points of interest. Biological stories include reproduction, digestive tract enzymes, evolution of the rumen, lipid metabolism and evolution of wool. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. The first step of the reference sequence assembly involved de novo assembly of 75X reads from the Texel ewe into contigs and scaffolds. Once that was completed, sequences from both animals were used for gap filling. Version 2.0 of the ovine whole genome reference sequence (Oar v2.0) was publicly released in February, 2011 and Oar v3.1 was released in October, 2012 through NCBI GenBank. The ovine whole genome reference sequence is being used by researchers worldwide to explore genetic regions of interest. These regions likely contain genes and regulatory sequences that influence phenotypes in sheep. Thus, the reference assembly is contributing to improvement in the efficiency of research targeted towards sheep production as well as the use of sheep as a biomedical model. The reference genome assembly is now being annotated by Ensembl using an RNA dataset produced by Roslin Institute. In this way, genes and genetic regulation elements will be identified within the assembly. The RNA dataset produced by Roslin Institute is the largest transcriptome analysis of any species in Ensembl, including man. The next version of the assembly (Oar v4.0) will include the annotation and its release is expected by late 2015. Updated patches for some regions will likely be released before then. Kim Worley (BCM-HGSC) received funding from USDA/AFRI to fill gaps in the sheep assembly. Using technology and analyses developed within the project. About 89% of the gaps that existed within the assembly have been closed. This has resulted in a contig N50 of over 500 kb (previously at 41.7 kb). However, there was only a minor shift in scaffold N50 (from 100.1Mb to 101.2Mb).

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Dong, Y.M., Cockett, N., 3. c. (2013). A reference genome of the domestic goat (Capra hircus) generated by Illumina sequencing and whole genome mapping. Nature Biotechnology, 31, 135-41.
  • Type: Other Status: Published Year Published: 2012 Citation: Jiang, Y., Xie, M., Dalrymple, B.P., Kijas, J., Talbot, R., Archibald, A., Maddox, J.F., Faraut, T., Cockett, N. (2012). The domestic sheep reference genome assembly (pp. P1019). Cairns: Proc. 33rd International Society of Animal Genetics.


Progress 01/01/12 to 12/31/12

Outputs
Target Audience: Researchers Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Dr. Jiang Yu was hired as a post-doctoral fellow on the project. By the last year of the project, Dr. Yu was the key contact for analyses and updates on the reference genome sequence. He is now a research scientist at the Beijing Genomics Institute (BGI). How have the results been disseminated to communities of interest? Outcomes of this project have been disseminated through the International Sheep Genomics Consortium (ISGC), which includes investigators from across the world who are engaged in research on livestock genomics. The ISGC achieved communication in a variety of ways, including bi-weekly conference calls, as well as face-to-face workshops at the annual Plant and Animal Genome (PAG) meeting and the bi-annual meetings of the International Society of Animal Genetics (ISAG). Presentations on the reference genome sequence and assembly were made by the principle investigators each year at the PAG and ISAG meetings. A list-server with over 75 consortium participants ensured communication through emails of key decisions, results and outcomes. Presentations were also made at the annual American Sheep Industry (ASI) meetings and an article on the reference genome assembly was included in the 2013 ASI magazine. In this way, US sheep producers were made aware of the research activities within the project. A manuscript describing the whole genome assembly (Oar v3.1), the RH map, and the linkage map is in preparation. Highlights of differences between the genome structure of sheep, cattle and goats are included in the manuscript. The analysis of about a terabite of data on the transcriptome is also included. Variation of alleles, allelic imbalance and copy number variation have been included in the manuscript as points of interest. Biological stories include reproduction, digestive tract enzymes, evolution of the rumen, lipid metabolism and evolution of wool. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. The first step of the reference sequence assembly involved de novo assembly of 75X reads from the Texel ewe into contigs and scaffolds. Once that was completed, sequences from both animals were used for gap filling. Version 2.0 of the ovine whole genome reference sequence (Oar v2.0) was publicly released in February, 2011 and Oar v3.1 was released in October, 2012 through NCBI GenBank. The ovine whole genome reference sequence is being used by researchers worldwide to explore genetic regions of interest. These regions likely contain genes and regulatory sequences that influence phenotypes in sheep. Thus, the reference assembly is contributing to improvement in the efficiency of research targeted towards sheep production as well as the use of sheep as a biomedical model. The reference genome assembly is now being annotated by Ensembl using an RNA dataset produced by Roslin Institute. In this way, genes and genetic regulation elements will be identified within the assembly. The RNA dataset produced by Roslin Institute is the largest transcriptome analysis of any species in Ensembl, including man. The next version of the assembly (Oar v4.0) will include the annotation and its release is expected by late 2015. Updated patches for some regions will likely be released before then. Kim Worley (BCM-HGSC) received funding from USDA/AFRI to fill gaps in the sheep assembly. Using technology and analyses developed within the project. About 89% of the gaps that existed within the assembly have been closed. This has resulted in a contig N50 of over 500 kb (previously at 41.7 kb). However, there was only a minor shift in scaffold N50 (from 100.1Mb to 101.2Mb).

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Dong, Y.M., Cockett, N., 3. c. (2013). A reference genome of the domestic goat (Capra hircus) generated by Illumina sequencing and whole genome mapping. Nature Biotechnology, 31, 135-41. 170,


Progress 01/01/11 to 12/31/11

Outputs
OUTPUTS: "An ongoing project of the ISGC is development of a whole genome reference assembly. In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. The first step of the reference sequence assembly involved de novo assembly of 75X reads from the Texel ewe into contigs and scaffolds. Once that was completed, sequences from both animals were used for gap filling. Version 2.0 of the ovine whole-genome reference sequence (Oar v2.0) was publicly released in February, 2011 and Oar v3.1 was released in October, 2012 through NCBI GenBank. Several improvements to the assembly have been added in the past few months and include tracks for SNPs and annotations. These additions will improve accessibility for people searching the genome sequence. Chromosome assemblies can be found at http://www.ncbi.nlm.nih.gov/assembly/GCA_000298735.1/ and the full assembly can be found at http://www.livestockgenomics.csiro.au/cgi-bin/gbrowse/oarv3.1/." PARTICIPANTS: Utah State University (USA), CSIRO Livestock Industries (Australia), AgResearch (New Zealand), The Roslin Institute (UK), and Baylor College of Medicine-Human Genome Sequencing Center (USA). TARGET AUDIENCES: Researchers PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
In the last year, the ovine whole genome reference sequence has been used by researchers to explore genetic regions of interest. These regions likely contain genes and regulatory sequences that influence phenotypes in sheep. Thus, the reference assembly is contributing to improvement in the efficiency of sheep production as well as the use of sheep as a biomedical model.

Publications

  • Dong, Y.M., Cockett, N., & 3. c., (2012). A reference genome of the domestic goat (Capra hircus) generated by Illumina sequencing and whole genome mapping: Nature Biotechnology, doi:10.1038/nbt.2478. (Published).
  • Jiang, Y., Xie, M., Dalrymple, B.P., Kijas, J., Talbot, R., Archibald, A., Maddox, J.F., Faraut, T., & Cockett, N., 2012. The domestic sheep reference genome assembly. Proc. 33rd International Society of Animal Genetics, Cairns, Australia (PP1019). (Published).


Progress 01/01/10 to 12/31/10

Outputs
OUTPUTS: Members of the International Sheep Genomics Consortium (ISGC) have continued to refine the ovine genome assembly. A new version of the assembly (Oarv3.0) will be released within the next few months. Comparison of contig positions on the sequence scaffolds with locations in the genetic and RH maps have led to improvements in resolution of the assembly. This version will have fewer intra- and inter-scaffold gaps and unmapped sequences than Oarv2.0. In addition, scaffolds and super-scaffolds will cover larger regions. PARTICIPANTS: Utah State University (USA), CSIRO Livestock Industries (Australia), AgResearch (New Zealand), University of Melbourne (Australia), University of Sydney (Australia), University of New England (Australia), Research Institute of the Biology for Farm Animals (Germany), The Roslin Institute (UK), and Baylor College of Medicine-Human Genome Sequencing Center (USA). TARGET AUDIENCES: Researchers PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The ovine whole genome reference sequence has been used by several researchers to explore genetic regions of interest. These regions likely contain genes and regulatory sequences that influence phenotypes in sheep. Thus, the reference assembly is a tool that advances studies in genomics research.

Publications

  • No publications reported this period