Source: GEORGE MASON UNIVERSITY submitted to NRP
BOVINE MICRORNA TRANSCRIPTOME ANALYSES: DISCOVERY, TISSUE SPECIFIC EXPRESSION PROFILE AND TARGET GENE PREDICTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0209208
Grant No.
2007-35205-17920
Cumulative Award Amt.
(N/A)
Proposal No.
2006-04817
Multistate No.
(N/A)
Project Start Date
May 1, 2007
Project End Date
Apr 30, 2012
Grant Year
2007
Program Code
[43.0]- Animal Genome
Recipient Organization
GEORGE MASON UNIVERSITY
4400 UNIVERSITY DRIVE
FAIRFAX,VA 22030
Performing Department
(N/A)
Non Technical Summary
A previously unknown, fundamental layer of gene regulation has recently been discovered. This gene regulation is controlled by small RNA molecules (micro-RNAs) that provide a targeting mechanism for cellular machinery that "silences" expression of protein coding genes in the cell. This type of gene regulation is apparently of extreme importance in biology, as it is conserved in organisms as diverse as plants, flies, worms, and humans. So far, there is no published data concerning this important mode of gene regulation in cattle. The objective of this project is to characterize micro-RNA expression in various tissues of cattle, and predict the genes that are targeted for silencing. The study will expand understanding of bovine gene regulation, and provide initial data to assist in determining the potential impact of micro-RNAs on animal development, productivity, and well-being.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043399104080%
3043499104020%
Goals / Objectives
The existence of a class of genes whose products are small RNA molecules apparently having major roles in gene regulation has very recently been discovered. These genes produce untranslated RNA molecules that are processed into short (17-23 nucleotides in length) microRNAs forming part of RNA-protein (RISC) complexes. Specific RISC complexes have been shown to regulate diverse processes in growth, development, and immune response. The micro-RNA (miRNA) genes have been shown to be expressed in tightly regulated, tissue and temporal-specific patterns in humans, mice, flies, worms, and even plant species. They appear to work mainly by targeting the RISC complex to complementary sequences in the untranslated sequences of RNAs that code form proteins (mRNA), preventing them from being translated into their protein products and thus providing a means to regulate gene expression after the messenger RNA has been produced. MicroRNA (miRNA) expression and the potential effects on physiology in bovine tissues is currently unknown. The current project has three objectives: (i) High-resolution discovery of bovine miRNA genes using in-silico analysis of the emerging draft genome sequence, validated by in-depth sequencing of small RNA fractions from a pool of diverse cattle tissues. (ii) Examination of the tissue specificity of cattle miRNA expression by characterization of miRNA expression profiles from 50 individual tissues. (iii) Prediction of the mRNA targets of miRNA regulation in each tissue examined.
Project Methods
Objective 1 will be pursued by two main approaches. First, an existing database of miRNA sequence discovered in other species (miRBase) will be compared to the bovine genome sequence (draft version 3) to identify likely bovine miRNA genes based on conservation of sequence between species. Previous studies have shown that the processed miRNA sequence for many genes, especially within mammals, is highly conserved and so this process is likely to identify a large percentage of the total number of bovine miRNA genes. The genomic DNA surrounding the conserved sequence in the bovine genome will be examined to determine if established hallmarks of miRNA gene structure are conserved, providing an initial indication that the gene is orthologous to that from the species in which it was discovered. An additional in silico analysis utilizing ab initio predictions based solely on genome sequence will also be used to broaden the pool of potential miRNA gene candidates. (insert previous line only if it is true you are going to do it). The second approach will be a verification of predicted genes using direct cloning and sequencing of small RNAs from bovine tissues. To be as comprehensive in this as possible, we plan to utilize a pool of multiple, diverse cattle tissues to provide breadth of representation, collect and clone the small RNA fraction from the pool, and subject the clones to intensive sequencing using next-generation sequencing technology capable of detecting miRNA expression even if for genes expressed at low levels in highly specific tissues. The next-generation technology is capable of collecting sequence of >1,000,000 individual small-RNA fraction clones, ensuring that even miRNA rare in the pool will be detected. The two approaches for Objective 1 are complementary. The predictions assist in sorting out the cloning artifacts such as degraded non-miRNA RNA contaminating the small RNA fraction from tissues, and the cloning/sequencing provides support that the predicted miRNA gene is functional. Objective 2 targets 50 distinct tissues for miRNA expression profiling using a lower-depth level of sequencing appropriate for individual tissues. A strategy of concatamerization of mature miRNA, and cloning concatamers for sequencing will provide 5,000 to 10,000 sequences per tissue. Preliminary data indicates that this level of sequencing should be sufficient to provide a general profile of miRNA expression, determine if cattle display tissue-specific expression, and identify individual miRNA in each tissue for comparison to mRNA expression in Objective 3. Objective 3 will combine data from the Bovine Gene Atlas under construction by us under a separate NRI grant, to identify likely functional mRNA targets of miRNA regulation in cattle tissues. The Gene Atlas project is generating next-generation sequencing depth information for the mRNA population in the same tissue samples that will be used for miRNA analysis, providing direct co-expression data. The miRNA sequence in a given tissue will be compared to mRNA sequence to detect potential binding sites providing targeting of RISC complexes.

Progress 05/01/07 to 04/30/12

Outputs
OUTPUTS: In the original grant we proposed making 50 miRNA libraries with each library comprising of 3000 sequences (150,000 total sequences). Because of the advances in the sequencing technologies we were able to sequence 96 miRNA libraries from the Calf, Fetus and cross of Wagyu and Black Angus, across 80 tissues. The total number of sequences were 7.9 million out of which 1.2 million sequences aligned to the miRBase database. One of the challenges for the miRNA target prediction from the bovine genome was that the 3'UTR regions were not well defined as the automated annotations were based on protein sequence alignments that completely miss the UTR regions. We used the RNASeq data generated by our collaborators from 3 different tissues to annotate the UTR regions. This also helped in annotating the majority of the tags from the bovine gene atlas project. We will release the available data in raw format and mysql soon. We are also analyzing/validating the miRNA expression profile differences in the Horn/Poll tissues and use it in conjunction with published gene expression data for finding the inverse trends to identify possible targets. PARTICIPANTS: PARTICIPANTS: Besides the PI's on this project Dr. Tara McDaneld (USDA, MARC, Clay Center, NE) and Dr. Le Ann Blomberg (USDA, BARC, Beltsville, MD) helped in preparing the miRNA libraries. Dr. Tad Sonstegard (USDA, BARC Beltsville) has generously offered sequencing support on the Illumina Genome Analyzer in his laboratory with no additional cost to this project other than the sequencing reagents. Naga Sridhar Betrapally (Bioinformatics PhD student at George Mason University) implemented computational pipeline to cluster the expression results, and further identify novel miRNAs. The analysis of these results will be submitted a s a MS thesis in 2012. Samriddhi Goswami (Bioinformatics PhD student at George Mason University(GMU) ) assisted in the bovine genome annotations and for performing the differential gene expression analysis. Kshma Aswath (Bioinformatics PhD student at GMU) worked on the bovine piRNAs and is interested in pursuing this topic for her PhD dissertation. TARGET AUDIENCES: Through the course of the project provided Bioinformatics support for analyzing the gene expression analysis in the embryoinic stem cell tissues to Dr. Carol Keefer from University of Maryland. Helped in annotating the markers involved in major commercially important traits in cattle for Prof. Yang Da and his colleagues on another NRI project Helped in analysis of bovine gene expression digital gene expression tags for researchers at Texas A&M University. Established research collaboration with Dr. David Lynn at Teagasc, Ireland for developing gene regulatory networks using the bovine gene atlas data. Provision of miRNA atlas data as query interface will help several other bovine genome researchers as well as others working in livestock species. Algorithms developed for fast alignment also suited for analysis of Next-Generation Sequencing. PROJECT MODIFICATIONS: Changes in sequencing technologies over the course of grant period allowed to determine 7.1 million sequences across 96 miRNA libraries. This data will be released soon for the public and will aid in the improvement of bovine genome annotation.

Impacts
Generated miRNA libraries for 80 cattle tissues - Developed algorithms for mapping reads to the Bovine Genome - Determined expression levels for the different tissues - Annotated the cattle genes for 3' and 5' UTR regions using RNASeq data - Improved the annotation of the bovine gene atlas tags that are mostly mapped to 3' UTR regions - Created mysql databases for miRNA sequences and targets. - In addition to miRNA, we have also generated the sequence data from fetal and adult testes. - Identify differences across these tissues and different developmental stages.

Publications

  • Development of a Relational Database for Studying the Content and Differential miRNA Expression within and across the Mammalian Species, 2008 (MS Thesis: Mark P O Connor, Thesis Director: Lakshmi K Matukumalli)
  • Master's Thesis:STUDY OF miRNA CONTENT (AND EVOLUTION) WITHIN THE MAMMALIAN SPECIES THROUGH DEVELOPMENT OF A RELATIONAL DATABASE AND BIOINFORMATICS ANALYSIS, Student Name: Mark O'Connor, Thesis Director: Lakshmi K Matukumalli, 2008
  • Master's Thesis: To be Submitted -- Clustering miRNA content across the Mammalian Species. 2012 (MS Thesis: Naga S Betrapally, Thesis Director: Don Seto and Huzefa Rangwala)
  • Tara G. McDaneld, Tim P.L. Smith, Matthew E. Doumit, Jeremy R. Miles, Luiz L. Coutinho, Tad S. Sonstegard, Lakshmi K. Matukumalli, Dan J. Nonneman, and Ralph T. Wiedmann MicroRNA expression profiles during swine skeletal muscle development BMC Genomics, 2009 Feb 10, 10:77
  • An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation. by Gregory P Harhay, Timothy PL Smith, Leeson J Alexander, Christian D Haudenschild,John W Keele, Lakshmi K Matukumalli, Steven G Schroeder, Curtis P Van Tassell,Cathy R Gresham, Susan M Bridges, Shane C Burgess and Tad S Sonstegard, 2010 Genome Biology Vol. 11, No. 10; R102.
  • Bovine Genome Sequencing and Analysis Consortium The genome sequence of taurine cattle: a window to ruminant biology and evolution.Science. 2009 Apr 24, 324(5926):522-8
  • Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter. Liu GE, Weirauch MT, Van Tassell CP, Li RW, Sonstegard TS, Matukumalli LK, Connor EE, Hanson RW, Yang J. Genomics Proteomics Bioinformatics. 2008 Dec, 6(3-4):129-43.


Progress 05/01/09 to 04/30/10

Outputs
OUTPUTS: In the original grant we proposed making 50 miRNA libraries with each library comprising of 3000 sequences (150,000 total sequences). Because of the advances in the sequencing technologies we were able to now generate over 100 miRNA libraries and sequenced them using Illumina next generation sequencing platform at a very high depth. The total sequencing output currently is at 274,230,885 which is several fold increase than originally proposed. Using these data we have completed the miRNA expression profiling of the various tissues. One of the challenges for the miRNA target prediction from the bovine genome was that the 3'UTR regions were not well defined as the automated annotations were based on protein sequence alignments that completely miss the UTR regions. I have used the RNASeq data generated by our collaborators from 3 different tissues to annotate the UTR regions. This also helped in annotating the majority of the tags from the bovine gene atlas project. For this project we are currently using the bovine genome build from University of Maryland, UMD3.0 We have created a mysql database for the current miRBase miRNAs (Version 15) along with the miRNA target databases from microcosm and mirDB. We are currently in the process of building the web pages to enable queries of the miRNA profile in the bovine tissues as well as its targets. We were able to use the bovine gene atlas data and miRNA data in conjunction for analyzing the inverse trends in miRNA/target mRNA expression profiles across the various developmental stages. We are also analyzing/validating the miRNA expression profile differences in the Horn/Poll tissues and use it in conjunction with published gene expression data for finding the inverse trends to identify possible targets. PARTICIPANTS: In addition to the PI's on this project Dr. Tara McDaneld (USDA, MARC, Clay Center, NE) and Dr. Le Ann Blomberg (USDA, BARC, Beltsville, MD) helped in preparing the miRNA libraries. Dr. Tad Sonstegard (USDA, BARC Beltsville) has generously offered sequencing support on the Illumina Genome Analyzer in his laboratory with no additional cost to this project other than the sequencing reagents. Samriddhi Goswami (Bioinformatics PhD student at George Mason University(GMU) ) assisted in the bovine genome annotations and for performing the differential gene expression analysis. David Millis (Bioinformatics PhD student at GMU) is doing his PhD thesis on refining the bovine miRNA-target mRNA predictions using an ensemble of features including data mining. Sean Smith (Bioinformatics PhD student at GMU) is helping in developing the gene regulatory networks using the bovine gene atlas and miRNA atlas datasets. Kshma Aswath (Bioinformatics PhD student at GMU) worked on the bovine piRNAs and is interested in pursuing this topic for her PhD dissertation. Shama Baig Shakeel (MS student in Bioinformatics) and Kalpana Dommaraju (PhD student in Bioinformatics) have recently signed up for lab rotation on defining the bovine exome and for performing the comparative miRNA expression profile analysis across various species. TARGET AUDIENCES: Provided Bioinformatics support for analyzing the gene expression analysis in the embryoinic stem cell tissues to Dr. Carol Keefer from University of Maryland. Helped in annotating the markers involved in major commercially important traits in cattle for Prof. Yang Da and his colleagues on another NRI project Helped in analysis of bovine gene expression digital gene expression tags for researchers at Texas A&M University. Established research collaboration with Dr. David Lynn at Teagasc, Ireland for developing gene regulatory networks using the bovine gene atlas data. Provision of miRNA atlas data as query interface will help several other bovine genome researchers as well as others working in livestock species. PROJECT MODIFICATIONS: Funds remaining in the grant may allow for performing additional RNASeq libraries for still improving the bovine genome annotations.

Impacts
- Generated miRNA libraries for 100 cattle tissues - Sequenced 274,230,885 miRNAs from the above tissues - Annotated the cattle genes for 3' and 5' UTR regions using RNASeq data - Improved the annotation of the bovine gene atlas tags that are mostly mapped to 3' UTR regions - Created mysql databases for miRNA sequences and targets. A web interface for querying and visualization is currently under development - In addition to miRNA, we have also generated the sequence data for piRNAs from fetal and adult testes. - Identified several upregulated miRs and correspondingly down-regulated target mRNAs and vice versa across the developmental stages in various tissues. - Analyzing the miRNA/mRNA expression profiles of Horn/Poll tissues

Publications

  • The Bovine Gene Atlas: A Transcriptome Resource Revealing Novel Distinctive Tissue Characteristics and Evidence for Improving Genome Annotation Gregory P Harhay, Timothy PL Smith, Leeson J Alexander, Christian D Haudenschild, John W Keele, Lakshmi K Matukumalli, Steven G Schroeder, Curtis P Van Tassell, Cathy R Gresham, Susan M Bridges, Shane C Burgess and Tad S Sonstegard, 2010 ( In review)
  • 3 other manuscripts related to miRNA expression profiling and target analysis are currently under preparation (2010)


Progress 05/01/08 to 04/30/09

Outputs
OUTPUTS: Objective1. Discover bovine miRNAs using bioinformatics and experimental methods Three pooled samples containing multiple tissues were sequenced using next generation sequencing. This objective has been completed and a manuscript describing the conserved and novel microRNAs is under preparation Objective2. Generate tissue-specific miRNA expression profiles to complement mRNA expression studies The protocols for RNA extraction, sequencing from multiplexing and processing using the next generation sequencing have been streamlined. Using a combination of ABI sequencing and next generation sequencing cattle tissue the following samples have been sequenced so far Liver (Fetal and Adult), Testes (Fetal and Adult) piRNA and miRNA, Jejunm(Fetus), Kidney(Fetus), Reticulum(Fetus), reticulum(Juvenile), hypothalamus (Juvenile), spleen (Fetus), omasum (Juvenile), spinal cord (Juvenile), medulla(Juvenile), rumen, abomassum, ileum, hippocampus, thyroid, mammary and hornbud. In addition to these libraries several tissues belonging to muscular and nervous systems will be sequenced in early August 2009. Objective3. Identify bovine-specific miRNA targets from a joint analysis of miRNA and mRNA sequence and expression data The Digital gene expression data is now being combined with RNASeq data from muscle and liver tissues from collaborators for obtaining better gene models for miRNA target predictions. Also tags from the Bovine Gene Atlas, Bovine microRNA and RNASeq are being mapped to the latest Btau3.0 from University of Maryland (July 2009). A manuscript describing the data from the Bovine Gene Atlas project is being submitted shortly. Objectives 4 and 5: Website development and release of tools Bovine gene atlas has been hosted at the agbase website from Mississippi state university. A new version of the gbrowse is currently under development by a PhD student at GMU under the guidance of Dr. Matukumalli for jointly hosting the mRNA and miRNA using new UMD3.0 assembly. PARTICIPANTS: Tara McGayle from US-MARC who is helping in the experimental protocols for RNA preparation. Tad S Sonstegard from BARC, Belstville is partnering for sequencing on next generation sequencer Brian Dalrampyle from CSIRO, Australia is one of our collaborator in data analysis Luiz L Coutinho, University of Sao Paolo, Brazil is participating in the data analysis. Shane Burgress from University of Mississippi for web hosting the Gene Atlas data GMU faculty Huzefa Rangwala and Don Seto are participating in the RNASeq data analysis. David Millis, a PhD student from GMU is currently working with Dr. Lakshmi Matukumalli for developing gene regulatory works in cattle using mRNA and microRNA data. Samriddhi goswami, a PhD student is developing cattle gbrowse for hosting mRNA and miRNA expression data on the new version of UMD assembly. She is also developing new algorithm for utilizing DGE for predicting the accurate gene models TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Transcriptional control of gene expression is expanding with the finding of new classes of small RNAs such as piRNAs and TSSa-RNAs. The role of anti-sense transcripts in gene regulation is also coming to light. In light of these new findings the research objectives are being tailored to capture these new classes of RNAs. We have already sequenced the piRNAs from the fetal and mature bull testes and are analyzing the results. Cataloging the findings will be a great resource for future bovine genomics research. We are also analyzing the data for generating new hypothesis on gene regulation in various organ systems. We have identified 121 miRNA sequences from cloning five different tissues where as one next generation sequencing run identified 234 miRNAs from the same tissues after pooling. Hence we are now using the next generation sequencing for sequencing the remaining tissues. We have optimized new protocol for using the increased throughput from the next generation sequencing by tagging that allows multiplexing has allowed us for doing additional libraries for the funding provided at a significantly higher depth. We have so far generated more than 100 million next generation sequence reads from the various tissues listed above. A software pipeline to process these reads and store them in a relational database has been established. For analyzing the mRNA and miRNA interactions and for understanding the important biological mechanisms/interactions it is essential to have accurate gene models. The following efforts are paving the way. The latest version of the bovine genome assembly (UMD 3.0, July 2009) from University of Maryland placed most of the genome in chromosomes and from our preliminary studies it is in very good agreement with the high density linkage map. This will allow us to map the cattle genes accurately that have so far suffered from the misassembly problems. We are collaborating with University of Missouri, Columbia and Robert Li from BFGL and gained access to RNASeq library data for liver and abomassum. These data are currently being mapped to the genome for improving the gene models. One of the graduate student (Samriddhi) is is developing cattle gbrowse for hosting mRNA and miRNA expression data on the new version of UMD assembly. She is also developing new algorithm for utilizing DGE for predicting the accurate gene models and another student (David Millis) is using systems biology approaches for predicting using gene expression and text mining for identification of gene regulatory networks in cattle.

Publications

  • Tara G. McDaneld, Tim P.L. Smith, Matthew E. Doumit, Jeremy R. Miles, Luiz L. Coutinho, Tad S. Sonstegard, Lakshmi K. Matukumalli, Dan J. Nonneman, and Ralph T. Wiedmann MicroRNA expression profiles during swine skeletal muscle development BMC Genomics, 2009 Feb 10, 10:77
  • Bovine Genome Sequencing and Analysis Consortium The genome sequence of taurine cattle: a window to ruminant biology and evolution.Science. 2009 Apr 24, 324(5926):522-8
  • Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter. Liu GE, Weirauch MT, Van Tassell CP, Li RW, Sonstegard TS, Matukumalli LK, Connor EE, Hanson RW, Yang J. Genomics Proteomics Bioinformatics. 2008 Dec, 6(3-4):129-43.
  • Development of a Relational Database for Studying the Content and Differential miRNA Expression within and across the Mammalian Species, 2008 (MS Thesis: Mark P O Connor, Thesis Director: Lakshmi K Matukumalli)


Progress 05/01/07 to 04/30/08

Outputs
OUTPUTS: Objective1: In-silico analysis of miRNA in the draft genome sequence has been completed. Two pooled miRNA samples from diverse tissues have been sequenced that helped validate the miRNA discovered from the in silico studies. Status: Completed Objective2: 20 miRNA libraries have already been constructed and sequencing on next generation sequencer was performed on 6 of these libraries. Library construction and sequencing is currently underway for the rest of the tissues. Status: In progress Objective3: The mRNA sequence data from the gene atlas project has been populated into a database and is currently being analyzed. The connections between the mRNA-miRNA data has not yet been established. The in-silico pipeline for data analysis is currently being constructed and data analysis will be performed when miRNA data is available for all tissue samples. Status: In progress PARTICIPANTS: 1. Dr.McDaneld, Tara Gayle ARS Scientist, Meat and Animal Research Center, Clay Center, Nebraska Effort: Construction of miRNA libraries 2. Dr. Tad Sonstegard ARS Scientist, Bovine Functional Genomics Laboratory, Beltsville, Maryland Effort: Sequencing of miRNA libraries on next-generation sequencer 3. Mark O' Connor Bioinformatics Student George Mason University, Virginia Effort: Comparitive genome analysis of miRNA across human, mouse and cattle as part of Master's thesis. 4. Brian Dalrampyle CSIRO, Australia Effort: Collaborative research plans for data analysis were discussed at the Plant and Animal Genome meeting at San Deigo, CA. These will be executed once all the sequence data from the project becomes available TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: In the grant we have proposed for sequencing the miRNA on the ABI3730 platform. We could substantially increase the amount of sequencing by switching to the next generation sequencing platform due to availability of Illumina Genome Analyzer at the USDA-ARS Beltsville location.

Impacts
The In-silico analysis and sequencing of small RNA has resulted in the discovery and expression profile of several homologous and novel miRNAs expressed in the bovine tissues. These results are being analyzed for writing several high-impact publications.

Publications

  • Master's Thesis:STUDY OF miRNA CONTENT (AND EVOLUTION) WITHIN THE MAMMALIAN SPECIES THROUGH DEVELOPMENT OF A RELATIONAL DATABASE AND BIOINFORMATICS ANALYSIS, Student Name: Mark O'Connor, Thesis Director: Lakshmi K Matukumalli, 2008