Source: UTAH STATE UNIVERSITY submitted to
BUILDING THE SHEEP GENOMES DATABASE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
1000798
Grant No.
2013-67015-21372
Project No.
UTA-01151
Proposal No.
2013-00976
Multistate No.
(N/A)
Program Code
A1201
Project Start Date
Sep 1, 2013
Project End Date
Aug 31, 2017
Grant Year
2013
Project Director
Cockett, N. E.
Recipient Organization
UTAH STATE UNIVERSITY
(N/A)
LOGAN,UT 84322
Performing Department
Cooperative Extension
Non Technical Summary
The proposal outlines efforts to obtain whole genome and exon sequences for a large number of sheep and collect those sequences, as well as others contributed by the research community, into a publicly available database referred to as the Sheep Genomes Database. The resulting database will accelerate searches for genetic regions and genes influencing phenotypes in sheep, as well as facilitate comparative studies across the genomes of ruminant species.
Animal Health Component
0%
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30436991080100%
Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3699 - Sheep and wool, general/other;

Field Of Science
1080 - Genetics;
Goals / Objectives
In this project, we intend to 1) collect whole genome sequence from 100 genetically diverse sheep; 2) sequence the protein-coding exons (i.e. exome) of 145 additional sheep that display divergent and specialized breed differences; 3) identify sequence variants within and between these genomes; and 4) make the data available to the research community through the development of the Sheep Genomes Database. Year 1: Whole genome sequences from 100 animals will be obtained and downloaded into the variant detection pipeline in the first year of the project. The design of the NimbleGen capture chip will be completed by BCM-HGSC and used to amplify the exomes of 145 animals. These exome sequence data will also be downloaded into the pipeline and the analysis of genetic variation will commence. Year 2: The variant detection and annotation pipelines will be implemented and refined as the analyses dictate. Design of the database will be discussed and finalized by the ISGC. Sequences and variant tracks will be uploaded into the database, as well as other whole genome sequences contributed by partnering institutions. Year 3: Information about the database, including instructions on submission of sequence data and analysis of variation, will be disseminated to the research community. Additional whole genome sequences will be added as they are submitted. Detection, analysis and annotation of variation will continue.
Project Methods
Whole genome sequence from 100 sheep that will be included in this project include 75 animals that have been resequenced by members of the International Sheep Genomics Consortium (ISGC) and an additional 25 animals that will be resequenced within the next six months using other ISGC funds. The sequence reads will be use to develop reference guided assemblies for each animals by mapping reads against the sheep genome assembly version 3.1 (Oar v3.1). Sequence variants within and between these genomes will be analyzed and annotated. The final component of the project will be the development of the Sheep Genomes Database, which will make the data publicly available to the research community.

Progress 09/01/13 to 08/31/17

Outputs
Target Audience:Researchers Changes/Problems:All objectives of the Sheep Genome Database Project have been successfully completed. What opportunities for training and professional development has the project provided?Information on accessing the Sheep Genomes Database can be found through National Center for Biotechnology Information (NCBI) and the European Variation Archive (EVA) websites. How have the results been disseminated to communities of interest?H. Daetwyler (La Trobe University, Australia) presented the final report of the Sheep Genomes Database at the January 15, 2018 meeting of the International Sheep Genomics Consortium as well as a presentation during the NRSP8 Cattle/Sheep meeting on January 14, 2018. Both meetings were held in San Diego, Ca. What do you plan to do during the next reporting period to accomplish the goals?The Sheep Genomes Database project funded by USDA has been completed. However, Run 3 will align all Run 2 sequences and any new submitted genomes to the Oar_rambouillet_v1.0 assembly. Run 3 is expected to be completed at the end of 2018

Impacts
What was accomplished under these goals? A sheep genomes database, funded by a 2013 NIFA/AFRI grant, contains whole genome sequences from 935 sheep from 21 countries and 69 breeds. Over 50 million SNPs and indels were identified with high confidence using two variant-calling platforms. Data in the database is publicly available (http://www.ebi.ac.uk/eva/?eva-study=PRJEB14685) via European Variation Archive (EVA), which provides public access of genome information, data storage and variant accessioning. Variants are available as raw (unfiltered) or filtered data following application of a comprehensive QC protocol and information about the variants can be found through dbSNP.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Cockett, N. (2014). Responsibilities of being the land-grant institution for the state of Utah. J. Developmental and Sustainable Agriculture, 9, 1-7


Progress 09/01/15 to 08/31/16

Outputs
Target Audience:Researchers Changes/Problems:A no-cost one-year extension was requested on this NIFA grant because of additional work that is being completed at the Baylor College of Medicine. The second objective of the project was dropped and instead, tissues from a Rambouillet animal will be assayed using PacBio IsoSeq in order to obtain long-read sequences from RNA. These data will contribute to the Ovine FAANG Project. What opportunities for training and professional development has the project provided?Presentations on the Sheep Genomes Database were presented during the NRSP8 Animal Genomics meeting on January 15, 2017 and the International Sheep Genomics Consortium (ISGC) meeting on January 16, 2017 in San Diego. Updates on Run 1 and the proposed Run 2 analyses were presented and discussed. The following opportunities were inadvertently omitted from the Progress Report for the first year of this Grant - Opportunities A post-doctoral research fellow will be hired on the project. Oversight of the postdoc will be provided by Dr. James Kijas, CSIRO Animal, Food and Health Sciences, Queensland, Australia. How have the results been disseminated to communities of interest?A poster on the project was presented at the USDA AFRI PD meeting on January 13, 2017 in San Diego, CA. The following dissemination was inadvertently omitted from the Progress Report for the first year of this Grant - Dissemination Outcomes of this project will be disseminated through the International Sheep Genomics Consortium (ISGC. In addition, presentations will be made at meetings such as the Plant and Animal Genome (PAG), International Society of Animal Genetics (ISAG) and the World Congress on Genetics Applied to Livestock Production (WCGALP). What do you plan to do during the next reporting period to accomplish the goals?A total of 950 whole genome sequences (the original 455 genomes analyzed in Run 1 and 495 new sequences) will be analysed in Run 2 and incorporated into the database in 2017. In addition, RNA long-read sequences will be generated from Benz 2616, a Rambouillet ewe using PacBio IsoSeq analysis and contributed to the Ovine FAANG Project. The following plan of work was inadvertently omitted from the Progress Report for the first year of this Grant - Plan of Work In this project, we intend to 1) collect whole genome sequence from 100 genetically diverse sheep; 2) sequence the protein-coding exons (i.e. exome) of 145 additional sheep that display divergent and specialized breed differences; 3) identify sequence variants within and between these genomes; and 4) make the data available to the research community through the development of the Sheep Genomes Database. Year 2: The variant detection and annotation pipelines will be implemented and refined as the analyses dictate. Design of the database will be discussed and finalized by the ISGC. Sequences and variant tracks will be uploaded into the database, as well as other whole genome sequences contributed by partnering institutions. Whole genome sequence from additional sheep will be obtained over the coming year, in order to meet the target of 100 animals. The animals will be drawn from the ISGC DNA Repository which contains approximately 5,000 DNA samples from over 100 breeds. Samples will be selected to ensure a broad representation of breeds and genetic diversity including sheep from Europe, Asia-Pacific, Africa, the Middle East and the Americas. The emphasis on diversity will ensure a broad and unbiased spectrum of polymorphism, rather than a subset of variants that are at high frequency only within a sub-set of breeds, such as European derived animals. Sequences will also be collected from three trios (9 individuals) that are composed of a ram, ewe and their offspring. The addition of trios will assist in the direct observation of phase and construction of haplotype blocks, as well as assist on aspects of variant calling quality control (QC).

Impacts
What was accomplished under these goals? The Sheep Genomes Database currently contains whole genome sequences from 455 sheep collected from around the world and aligned to Oar v3.1. Analysis of 250 of the sequences revealed over 80 million SNPs and indels with high confidence using two variant-calling platforms. Data in the database is publicly available via European Variation Archive (EVA), which delivers key advantages concerning public access of genome information, data storage and variant accessioning. Variants are available as i) raw (unfiltered) or ii) filtered following application of a comprehensive QC protocol. The following accomplishments were inadvertently omitted from the Progress Report for the first year of this Grant - Accomplishments In this project, we intend to 1) collect whole genome sequence from 100 genetically diverse sheep; 2) sequence the protein-coding exons (i.e. exome) of 145 additional sheep that display divergent and specialized breed differences; 3) identify sequence variants within and between these genomes; and 4) make the data available to the research community through the development of the Sheep Genomes Database. Year 1: Whole genome sequences from 100 animals will be obtained and downloaded into the variant detection pipeline in the first year of the project. The design of the NimbleGen capture chip will be completed by BCM-HGSC and used to amplify the exomes of 145 animals. These exome sequence data will also be downloaded into the pipeline and the analysis of genetic variation will commence. Whole genome sequencing promises to accelerate our ability to identify the genetic basis of phenotypic variation in domestic animals, and to understand aspects of their population history. The International Sheep Genomics Consortium (ISGC) has initiated establishment of the Sheep Genomes Database, a public repository designed to serve as a data portal and coordination point for the analysis of ovine whole genome sequences. To begin populating the database, 73 sheep from 40 divergent breeds were sequenced to an average depth of 10-fold coverage per animal. Following variant calling which detected ~ 20 million SNP, 137 genomic regions were identified with significantly reduced levels of heterozygosity that are likely to have experienced selection. A comparison of these regions against a completed SNP50 based scan for selection sweeps revealed a number of overlaps including at the RXFP2 gene associated with poll - horn and pigmentation genes including MSRB3, MITF and ASIP. Strong selection sweeps were also detected surrounding genes involved in wool growth and development (EDAR, HOXC13), body mass, height and early maturation (LIN28B, IGFBP4, PPP2R2A) and disease resistance (MHC, SOCS1). These results advance our understanding of the genetic history of this important livestock species. The attributes of the Sheep Genomes Database will be presented to foster awareness and utilisation across the research community.

Publications


    Progress 09/01/14 to 08/31/15

    Outputs
    Target Audience:Target Audience Researchers Changes/Problems:Changes/Problems Completion of the database is on target. No extension of the grant will be required. What opportunities for training and professional development has the project provided?Opportunities A poster on the project will be presented at the USDA AFRI PD meeting on January 11, 2016 in San Diego. How have the results been disseminated to communities of interest?Dissemination The International Sheep Genomics Consortium (ISGC) meeting was held on January 12, 2015 in conjunction with the PAG meeting in San Diego. Updates on the SheepGenomes Database were presented and discussed. Additional sheep genomes for inclusion in the database were identified. What do you plan to do during the next reporting period to accomplish the goals?Plan of Work Analysis of the sequence data will continue through the next six months. The database will go live at the conclusion of the USDA grant in April 2016.

    Impacts
    What was accomplished under these goals? Accomplishments Sequence data from 450 sheep genomes of more than 50 breeds are now included in the database. Input for all genomes is made via SRA deposition, which ensures that all data can be made publically available. Analysis using two variant callers has resulted in over 80 million variants, including SNPs and insertions/deletions. Variants are available as raw (unfiltered) or filtered following application of a comprehensive QC protocol.

    Publications


      Progress 09/01/13 to 08/31/14

      Outputs
      Target Audience: Target Audience Researchers Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Opportunities The Sheep Genomics Workshop was held in Lincoln, NE on November 13 and 14, 2014. The workshop was conducted to facilitate discussions on the challenges that limit the application of genomic information in the U.S. sheep industry. Participants of the workshop developed a strategy that will result in improved genomic information and tools available to sheep producers. Another equally important goal of the workshop was the development of collaborative efforts among the workshop participants for the identification and collection of sheep samples to be included in the SheepGenomes Database. How have the results been disseminated to communities of interest? Dissemination The International Sheep Genomics Consortium (ISGC) meeting was held on January 13, 2014 in conjunction with the PAG meeting in San Diego. Updates on the SheepGenomes Database were presented and discussed. The Sheep Genomics Workshop was held in Lincoln, NE on November 13 and 14, 2014. During the workshop, participants identified at least three U.S. sheep breeds that can be included in the SheepGenomes Database project. What do you plan to do during the next reporting period to accomplish the goals? Plan of Work Whole genome sequence (WGS) data from over 450+ sheep will be generated by early 2015 and deposited into the SheepGenomes Database. An additional 1000+ genomes will be contributed by the end of 2015.

      Impacts
      What was accomplished under these goals? Accomplishments Genomic DNA from over 450 sheep have been collected and their whole genome sequence (WGS) will be included in the SheepGenomes Database. Around 72 animals are from the Ovine HapMap project and another 96 animals are from the USMARC Sheep Diversity Panel. Additional samples have been collected in North Africa, IRan, New Zealand and Australia, as well as Asian Mouflon, bighorn and thinhorn wild sheep.

      Publications