Source: UNIVERSITY OF MISSOURI submitted to
DEVELOPMENT OF OVINE EXPRESSED SEQUENCE TAGS FOR THE STUDY OF FEMALE REPRODUCTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0206372
Grant No.
2006-35616-16729
Project No.
MO-ASCG1130
Proposal No.
2005-05007
Multistate No.
(N/A)
Program Code
43.1
Project Start Date
Apr 1, 2006
Project End Date
Sep 30, 2010
Grant Year
2006
Project Director
Green, J.
Recipient Organization
UNIVERSITY OF MISSOURI
(N/A)
COLUMBIA,MO 65211
Performing Department
ANIMAL SCIENCES
Non Technical Summary
Reproductive loss represents a major economic drain on the livestock industry. As many as 1/3 of potential conceptuses are lost during the first month of gestation. The main emphasis of this work is to discover genes crucial to reproductive processes in sheep. Reproductive tissues undergo dramatic remodeling during the estrous cycle and pregnancy - the result of the expression of distinct genes at specific points in time. Here we propose to produce full-length normalized cDNA libraries from reproductive and embryonic tissues representing a variety of developmental stages and distinct times during the estrous cycle and pregnancy. The clones will be sequenced to obtain sequence information from the 5' ends of the cDNAs. These data will be made publicly available on the UMC website and in the Livestock EST Gene Family Database located at Texas A&M University. This proposal addresses one of the NRI's critical research issues, which is the Genomics of Future Food and Fiber Production. More specifically, it addresses Animal Genome Reagent and Tool Development, 43.1: 'large scale EST sequencing' in sheep in order to develop genetic tools useful to understand, and eventually improve, reproductive processes in livestock.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3013610105030%
3033610105030%
3043610105040%
Goals / Objectives
The goal is to produce full-length normalized cDNA libraries from reproductive and embryonic tissues representing a range of developmental stages and collected from distinct times during the estrous cycle and pregnancy. The clones will be sequenced to obtain sequence information from the 5' ends of the cDNAs.
Project Methods
The templates will be prepared by rolling-circle amplification and the cDNAs will be sequenced from the 5' end. Once the sequences are available, they will be submitted to Genbank and the cDNAs themselves will be made available to outside researchers upon request. The sequences will be clustered and analyzed to identify putative SNPs, to define gene families within the clusters, to identify orthologs and to annotate the ESTs with gene ontology terms from related vertebrate genes. All results from the project will be made publicly available on the UMC project website and in the Livestock EST Gene Family Database located at Texas A&M University.

Progress 04/01/06 to 09/30/10

Outputs
OUTPUTS: The initial goal of this project was to generate and obtain sequence information from full-length cDNA libraries from reproductive and embryonic tissues. Samples were harvested from 'early development' (oocytes, embryos and D13 & D16 conceptuses), from 'late development' (D20 embryos/placenta and D36 embryos/placenta), from the ovary (follicles and Corpora lutea on Days 0, 3, 6, 10, 13 and 16; from pregnant and nonpregnant animals) and female reproductive tracts (oviduct and endometrium from pregnant and nonpregnant animals). Additional collections were obtained from a range of other adult and neonatal organs (sk. muscle, brain, liver, spleen, thymus, testes). The picked clones were sequenced from the 5' ends of the cDNAs. Each library possessed tissue-specific sequence-tags to permit clone tracking and the ability to connect a particular EST with its tissue of origin. These preliminary efforts generated over 12,000 sequences by using traditional Sanger sequencing technology and all the sequences were deposited in publicly available databases. A parallel effort involved sequencing the libraries by using an Illumina Genome Analyzer II. The sequential application of commercial and publicly available sequence assembly programs (NextGene and CAP3) was used to perform de novo assembly of transcripts that are highly accurate (typically greater than 99 percent identical to known ovine transcripts). The Illumina sequencing approach was found to be an extremely cost effective way to generate cDNA sequences. Applying this pipeline to the first round of library sequencing generated approximately 103,000 contigs greater than 100bp in length with an average length of approximately 500 bp, which is similar in length to a typical EST read. Of these contigs, 64 percent matched known ESTs in the public domain. Another 32 percent matched known human or bovine genes. Most of the remainder represented ovine-specific transcripts (e.g. ovine endogenous retrovirus, ovine-specific gene family expansions, etc.). A subsequent round of library sequencing has been performed and substantial improvements in the number and quality of the resulting transcript contigs were obtained (i.e. an increase in average contig length and a decrease in the number of assembled sequences as contigs representing different regions of the same transcripts were bridged). The sequences were combined with all publicly available ovine EST in CAP3 to produce an additional transcript list. The current reference consists of 72945 sequences that are greater than 100bp and that can be annotated to known human or bovine reference sequences. Included in this sequence set are rather large contigs that represented transcripts known to be several thousand bases in length. Some examples that were represented include low density lipoprotein receptor-related protein 2 (15735 bases) and myosin heavy chain 9 (7507 bases); full-length mitochodrial genome was also assembled (~16,600 bp). The project website is at: http://genome.rnet.missouri.edu/Ovine/. PARTICIPANTS: Shasta Cernea (undergraduate researcher) - Shasta has been performing some basic bioinformatics on the EST clusters currently available. This work was part of her senior project. Tina Egen (Research Specialist) - Tina coordinated the animal breeding and organized the tissue collection and isolation of total RNA from the target organs. Sriparna Bagchi (postdoctoral trainee) - as part of her training Sriparna performed nearly all the molecular biology aspects of the project (mRNA isolation; library production and characterization). TARGET AUDIENCES: Scientists focused on comparative gene expression and reproductive physiology. PROJECT MODIFICATIONS: New high-throughput sequencing technologies permited the generation of long transcripts at a fraction of the cost of traditional sequencing approaches (methods that were proposed in the initial application). The use of these new sequencing techniques are being employed in parallel with classical sequencing to permit the generation of substantially more data for the scientific community than would have been possible at the time the project was initiated.

Impacts
This work using a sheep model will help in the development of genetic tools that will improve understanding, and eventually improvement, of reproductive processes in livestock. The results from this work have begun to illustrate some of the similarities and differences that exist between related economically-important animals (e.g. cattle, goats and sheep) in regard to their transcriptional regulation during pregnancy and development. The use of high-throughput sequencing of sheep cDNA libraries has dramatically increased the amount of information available regarding the transcriptome in sheep, for a fraction of the cost of traditional EST projects. Furthermore, the development of a de novo assembly pipeline for building transcripts has provided an opportunity to establish a detailed transcriptome from an agriculturally important species. Once in place, de novo assembled transcriptomes can serve as reference templates for expression profiling, even in the absence of a sequenced and annotated genome.

Publications

  • Green JA and Wells K (2010) De novo construction of transcriptomes. Plant and Animal Genome Meeting January 9 - 13, 2010, San Diego CA; Abstract P117


Progress 04/01/09 to 03/31/10

Outputs
OUTPUTS: The goal of this project was to produce full-length cDNA libraries from reproductive and embryonic tissues. Tissues were harvested from 'early development' (oocytes, embryos and D13 & D16 conceptuses), from 'late development' (D20 embryos/placenta and D36 embryos/placenta), from the ovary (follicles and Corpora lutea on Days 0, 3, 6, 10, 13 and 16; from pregnant and nonpregnant animals), female reproductive tract (oviduct and endometrium from pregnant and nonpregnant animals) and several other samples from a range of other adult and neonatal organs (sk. muscle, brain, liver, spleen, thymus, testes). The clones were sequenced from the 5' ends of the cDNAs. Each library possessed tissue-specific sequence-tags to permit clone tracking and the ability to connect a particular EST with its tissue of origin. These preliminary efforts generated over 12,000 sequences by using traditional Sanger sequencing technology and all the sequences were deposited in publicly available databases. A parallel effort involved sequencing the libraries by using an Illumina Genome Analyzer II. The sequential application of commercial and publicly available sequence assembly programs (NextGene and CAP3) was then used to perform de novo assembly of transcripts that are highly accurate (typically greater than 99 percent identical to known ovine transcripts). The Illumina sequencing approach was found to be an extremely cost effective way to generate cDNA sequences. Applying this pipeline to the first round of library sequencing generated approximately 103,000 contigs greater than 100bp in length with an average length of approximately 500 bp, which is similar in length to a typical EST read. Of these contigs, 64 percent matched known ESTs in the public domain. Another 32 percent matched known human or bovine genes. Most of the remainder represented ovine-specific transcripts (e.g. ovine endogenous retrovirus, ovine-specific gene family expansions, etc.). A second round of library sequencing has been performed and substantial improvements in the number and quality of the resulting transcript contigs are expected (i.e. an increase in average contig length and a decrease in the number of assembled sequences as contigs representing different regions of the same transcripts are bridged). The project website is at: http://genome.rnet.missouri.edu/Ovine/. PARTICIPANTS: Shasta Cernea (undergraduate researcher) - Shasta has been performing some basic bioinformatics on the EST clusters currently available. This work was part of her senior project. Tina Egen (Research Specialist) - Tina coordinated the animal breeding and organized the tissue collection and isolation of total RNA from the target organs. TARGET AUDIENCES: Scientists focused on comparative gene expression and reproductive physiology. PROJECT MODIFICATIONS: New high-throughput sequencing technologies permit the generation of transcripts at a fraction of the cost of traditional sequencing approaches (that were proposed in the initial application). The use of these new sequencing techniques are being employed in parallel with classical sequencing to permit the generation of substantially more data for the scientific community than would have been possible at the time the project was initiated.

Impacts
This work using a sheep model will help in the development of genetic tools that will improve understanding, and eventually improvement, of reproductive processes in livestock. These results have begun to help illustrate some of the similarities and differences that exist between related economically-important animals (e.g. cattle, goats and sheep) in regard to their transcriptional regulation during pregnancy and development. The use of high-throughput sequencing of sheep cDNA libraries has dramatically increased the amount of information available regarding the transcriptome in sheep, for a fraction of the cost of traditional EST projects. Furthermore, the development of a de novo assembly pipeline for building transcripts has provided an opportunity to establish a detailed transcriptome from an agriculturally important species. Once in place, de novo assembled transcriptomes can serve as reference templates for expression profiling, even in the absence of a sequenced and annotated genome.

Publications

  • Green JA and Wells K (2010) De novo construction of transcriptomes. Plant and Animal Genome Meeting January 9 - 13, 2010, San Diego CA; Abstract P117


Progress 04/01/08 to 03/31/09

Outputs
OUTPUTS: The goal of this project is to produce full-length cDNA libraries from reproductive and embryonic tissues. The clones will be sequenced to obtain sequence information from the 5' ends of the cDNAs. Each library possesses tissue-specific sequence-tags to permit clone tracking and the ability to connect a particular EST with its tissue of origin. The sequences will be clustered and analyzed to identify putative SNPs, to define gene families within the clusters, to identify orthologs and to annotate the ESTs with gene ontology terms from related vertebrate genes. All results from the project will be made publicly available by depositing them in Genbank. Harvesting of the reproductive and developmental tissues was initiated in the Fall of 2006 (the subsequent breeding season after the grant began) and completed in early 2007. Tissues were harvested from 'early development' (oocytes, embryos and D13 & D16 conceptuses), from 'late development' (D20 embryos/placenta and D36 embryos/placenta), from the ovary (follicles and Corpora lutea on Days 0, 3, 6, 10, 13 & 16; from pregnant and nonpregnant animals), female reproductive tract (oviduct and endometrium from pregnant and nonpregnant animals) and several other samples from a range of other adult and neonatal organs (sk. muscle, brain, liver, spleen, thymus, testes). A post-doctoral scientist was hired to work full time on the project. Library production has been completed for all of the tissues except the morula and blastocyst (embryo) libraries. Small-scale test sequencing has been performed to assess the quality of each library and to provide some preliminary sequence information for the overall project. These preliminary efforts have generated over 12,000 sequences so far. The project website, which is updated regularly as more information is generated is at: http://genome.rnet.missouri.edu/Ovine/. Those libraries that exhibit good quality and complexity (i.e. large numbers of recombinants) have being pooled and additional test sequencing has confirmed the proper representation of the component libraries within each mixed library. These mixed libraries will be used for generation of additional single pass sequencing. A parallel effort has involved analyzing some of the libraries by 'Solexa' sequencing on an Illumina Genome Analyzer II. The sequential application of commercial and publicly available sequence assembly programs (NextGene and CAP3) has permitted the de novo assembly of theoretical transcripts that are highly accurate (typically ≥99% identical to known ovine transcripts). This approach appears to be an extremely cost effective way to generate cDNA sequences. As an example, applying this pipeline to conceptus and early placenta libraries (d13 conceptus, d16 conceptus, d20 placenta) generated >19,000 contigs greater than 100bp in length with an average length of ~400 bp, which is not incompatible with a typical EST read. Of these contigs, 67% matched known ESTs in the public domain. The remaining 32% all matched known human or bovine genes with the exception of a small proportion of known ovine-specific transcripts (e.g. ovine endogenous retrovirus). PARTICIPANTS: Sriparna Bagchi (postdoctoral trainee) - Sriparna has performed nearly all the molecular biology aspects of the project (mRNA isolation; library production and characterization) Shasta Cernea (undergraduate working on a Senior Project) - Shasta has been performing some basic bioinformatics on the EST clusters currently available. Tina Egen (Research Specialist) - Tina coordinated the animal breeding and organized the assisted with tissue collections and isolation of total RNA from the target organs. TARGET AUDIENCES: Scientists focused on comparative gene expression and reproductive physiology. PROJECT MODIFICATIONS: New high-throughput sequencing technologies permit the generation of transcripts at a fraction of the cost of traditional sequencing approaches. The use of these new sequencing techniques are being employed in parallel with classical sequencing to permit the generation of substantially more data for the scientific community than would have been possible at the time the project was initiated.

Impacts
This work using a sheep model will help in the development of genetic tools that will improve understanding, and eventually improvement, of reproductive processes in livestock. These results have begun to help illustrate some of the similarities and differences that exist between related economically-important animals (e.g. cattle, goats and sheep) in regard to their transcriptional regulation during pregnancy and development. The development of a de novo assembly pipeline for building transcripts has the potential to define and establish transcriptomes from agriculturally important species for a fraction of the cost of traditional sequencing projects.

Publications

  • Meeting Abstract: JA Green, S Bagchi, C Elsik, DH Keisler, RS Prather, MF Smith, GK Springer, JF Taylor (2009). Development of ovine expressed sequence tags for the study of female reproduction. USDA-NRI Animal Genome Annual Investigator Meeting, 9 January 2009; San Diego, CA.


Progress 04/01/07 to 03/31/08

Outputs
OUTPUTS: The goal of this project is to produce full-length cDNA libraries from reproductive and embryonic tissues representing a range of developmental stages and collected from distinct times during the estrous cycle and pregnancy. The clones will be sequenced to obtain sequence information from the 5' ends of the cDNAs. Once the sequences are available, they will be submitted to Genbank and the cDNAs themselves will be made available to outside researchers upon request. Each library possesses tissue-specific sequence-tag to permit clone tracking and the ability to connect a particular EST with its tissue of origin. The sequences will be clustered and analyzed to identify putative SNPs, to define gene families within the clusters, to identify orthologs and to annotate the ESTs with gene ontology terms from related vertebrate genes. All results from the project will be made publicly available on the UMC project website and in the Livestock EST Gene Family Database located at Texas A&M University. Harvesting of the reproductive and developmental tissues was initiated in the Fall of 2006 (the subsequent breeding season after the grant began) and completed in early 2007. Tissues were harvested from 'early development' (oocytes, embryos and D13 & D16 conceptuses), from 'late development' (D20 embryos/placenta and D36 embryos/placenta), from the ovary (follicles and Corpora lutea on Days 0, 3, 6, 10, 13 & 16; from pregnant and nonpregnant animals), female reproductive tract (oviduct and endometrium from pregnant and nonpregnant animals) and several other samples from a range of other adult and neonatal organs (sk. muscle, brain, liver, spleen, thymus, testes). A post-doctoral scientist has been hired to work full time on the project. All RNAs have been extracted and purified. cDNA library production is completed for all of the tissues except the oocyte and embryos libraries. Small-scale test sequencing has been performed to assess the quality of each library and to provide some preliminary sequence information for the overall project. These preliminary efforts have generated approximately 10,000 sequences so far. The project website, which is updated regularly as more information is generated is at: http://genome.rnet.missouri.edu/Ovine/. Those libraries that exhibit good quality and complexity (i.e. large numbers of recombinants) are being pooled and additional test sequencing is underway to confirm proper representation of the component libraries within the mixture. Once these mixed libraries are checked for good quality, they will be used for large-scale generation of single pass sequencing. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
This work using a sheep model will help in the development of genetic tools that will improve understanding, and eventually improvement, of reproductive processes in livestock. These results will also help illustrate some of the similarities and differences that exist between related economically-important animals (e.g. cattle, goats and sheep) in regard to their transcriptional regulation during pregnancy and development.

Publications

  • Meeting Abstract: JA Green, S Bagchi, C Elsik, DH Keisler, RS Prather, MF Smith, GK Springer, JF Taylor (2008). Development of ovine expressed sequence tags for the study of female reproduction. USDA-RNI Animal Genome Annual Investigator Meeting, 11 January 2008; San Diego, CA.


Progress 04/01/06 to 03/31/07

Outputs
The goal of this project is to produce full-length cDNA libraries from reproductive and embryonic tissues representing a range of developmental stages and collected from distinct times during the estrous cycle and pregnancy. The clones will be sequenced to obtain sequence information from the 5' ends of the cDNAs. Once the sequences are available, they will be submitted to Genbank and the cDNAs themselves will be made available to outside researchers upon request. The sequences will be clustered and analyzed to identify putative SNPs, to define gene families within the clusters, to identify orthologs and to annotate the ESTs with gene ontology terms from related vertebrate genes. All results from the project will be made publicly available on the UMC project website and in the Livestock EST Gene Family Database located at Texas A&M University. Harvesting of the reproductive and developmental tissues was initiated in the Fall of 2006 (the subsequent breeding season after the grant began) and completed in early 2007. Tissues were harvested from 'early development' (oocytes, embryos and conceptuses), from 'late development' (D20 embryos/placenta and D36 embryos/placenta), from the ovary (follicles and Corpora lutea on Days 0, 3, 6, 10, 13 & 16; from pregnant and nonpregnant animals), female reproductive tract (oviduct and endometrium from pregnant and nonpregnant animals) and several other samples from a range of other adult and neonatal organs (sk. muscle, brain, liver, spleen, thymus, testes). A post-doctoral scientist has been hired to work full time on the project. All RNAs have been extracted and purified. cDNA library production is underway and small-scale test sequencing has been initiated to assess library quality. Once the libraries are complete and checked for good quality, they will be pooled for generation of single pass sequencing.

Impacts
This work in sheep will help to develop genetic tools that will improve our understanding, and eventual improvement, of reproductive processes in livestock. This work will also help illustrate some of the similarities and differences that exist between related economically-important animals (e.g. cattle and sheep) in regard to their gene transcriptional regulation.

Publications

  • Meeting Abstract: JA Green, C. Elsik, DH Keisler, RS Prather, MF Smith, GK Springer, JF Taylor (2007). Development of ovine expressed sequence tags for the study of female reproduction. USDA-NRI Animal Genome Annual Investigator Meeting, 12 January, 2007; San Diego, CA.