Identification of Expression QTL Associated With Feed Efficiency in Beef Cattle

IDENTIFICATION OF EXPRESSION QTL ASSOCIATED WITH FEED EFFICIENCY IN BEEF CATTLE

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1021808

Grant No.

2020-67015-30829

Cumulative Award Amt.

$500,000.00

Proposal No.

2019-05998

Multistate No.

(N/A)

Project Start Date

Jun 1, 2020

Project End Date

May 31, 2024

Grant Year

2020

Program Code

[A1231]- Animal Health and Production and Animal Products: Improved Nutritional Performance, Growth, and Lactation of Animals

Recipient Organization
UNIVERSITY OF MISSOURI
(N/A)
COLUMBIA,MO 65211

Performing Department
Animal Science

Non Technical Summary
The U.S. has the fourth largest cattle population in the world by head count yet is the largest beef producer. There is a significant amount of genetic variation between animals in their ability to convert inputs (feed) to final product (growth) e.g. some animals grow well while eating very little while others grow poorly while eating a lot. The cost of feed represents up to 75% of the direct cost associated with beef production. In order for the U.S. to continue to lead in beef production and increase profitability of beef production, efficiencies in production must be optimized. Our overall objective is to identify DNA variants influencing gene expression linked to variation of feed efficiency of cattle and include these variants on commercially available assays to enable producers to more accurately select the most efficient animals. To achieve this, we will profile the gene expression of three relevant tissues in a large number of animals selected for extremes of feed efficiency. We expect that all the products of this research will enhance our understanding of the genetic/genomic mechanisms underlying the complex trait of feed efficiency and serve as a model for future work in related traits. Indeed, a better understanding of the biological mechanisms responsible for quantitative variation will help create the "map" that can be used by many scientific disciplines from nutritionists, physiologists, geneticist to genome engineers, enabling the scientific community to help accomplish the grand challenge of advancing our nation's ability to achieve global food security and fight hunger.

Animal Health Component

10%

Research Effort Categories

Basic

90%

Applied

10%

Developmental

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
303	3310	1080	50%
304	3310	1080	50%

Knowledge Area
303 - Genetic Improvement of Animals; 304 - Animal Genome;

Subject Of Investigation
3310 - Beef cattle, live animal;

Field Of Science
1080 - Genetics;

Keywords

Goals / Objectives
Our overall objective is to identify putative causal variants influencing gene expression and transcript utilization linked to phenotypic variation of feed efficiency of cattle.The objectives of this proposal are to:Transcriptome profile 107 hypothalamus, 107 small intestine and 150 liver tissues from animals selected to have extreme residual feed intake phenotypes.Generate variant calls from the transcriptome data to integrate with SNP chip genotype data to better represent observed variation in the transcriptome profiled samples.Perform alternative splicing analysis and generate intron/exon counts for all genes in the genome to allow eQTL analyses.Impute SNP chip genotypes on 11,000 animals with measurements of feed efficiency to 850k and whole genome sequence to enable GWAS for feed efficiency related traits.Perform eQTL analysis on the 107/150 transcriptomes using imputed sequence genotypes to identify variants associated with gene expression, splicing and exon usage.Integrate eQTL and GWAS results to identify variants most likely to be biologically causal and predictive of feed efficiency related phenotypes for inclusion on future generations of commercially available genotyping assays.

Project Methods
Data generation: Objective 1RNA-seq: We define a transcriptome as a single tissue for a single animal. We will collect a total of 253 transcriptomes to augment the 111 transcriptomes already generated and described in the preliminary data. We have identified three tissues for which we will generate stranded RNA-seq libraries, and we shall equimolar pool 85 libraries to be sequenced on the Illumina NovaSeq S4 flowcell using paired-end 100 bp (2x100 bp) reagents. On average, this will yield 30M fragments (60M reads) per library. This work will generate 107 transcriptomes each for hypothalamus and small intestine and 150 transcriptomes for liver. Using the large number of available transcriptomes, we shall investigate the production of alternate isoforms produced within and across tissues.We will perform an eQTL analysis and a transcript splice variant GWAS for each tissue. Expression profiles, and intron/exon counts for each sample or gene respectively will be used with the imputed WGS SNP variation to identify variants that are associated with gene expression, alternate transcript abundance or inton/exon usage. Initial expression analysis will focus on gene level expression data for genes annotated by Ensembl and use the new ARS-UCD1.2 reference genome. Additional analyses will be performed at the isoform and individual intron/exon level.Imputation: We will impute all 11,000 animal's chip genotypes to the level of genome sequence using our established imputation pipeline and the 1000 Bulls data or our own internal reference data.GWAS While imputed genotypes for tens of millions of variants across 11,000 samples by its very nature contains enormous information content, it also presents significant computations issues. First, because this dataset represents animals from multiple purebred and crossbred populations, one must account for the breed composition and background population structure to avoid false positive associations. Our breed composition pipeline CRUMBLER will provide estimates of breed composition for all animals that can be used to either a priori partition the animals into homogeneous groups for separate analyses or be used as covariates in the analysis to account for breed composition. Second, due to the immense size of the data and the need to account for population structure and relatedness to control false positives, the signal for true positives tend to also be reduced. While we have described several software packages by name, we will not restrict our analyses to those described and will use tools that are best suited to and capable of analyzing the data generated.Integration. The primary analysis phase produces the "catalog" of data needed for integration. The integration phase of analysis will utilize all analyses and will integrate the produced data with existing data from the PDs. Both GWAS and eQTL analyses can produce potentially thousands of significant results for each type of analysis. Here, we refer to a significant result as a single variant that surpasses a genome-wide threshold based on the appropriate analysis-specific method for controlling for multiple testing. We seek to answer the basic question, which GWAS variants are also eQTL variants? Importantly, we do not expect all GWAS variants to be eQTL variants and vice versa. Due to the power of integrating GWAS and eQTL results, there are many new software tools currently being developed to integrate these data. Because this is such a new field and methods are being rapidly developed, there has not been a consensus reached by the scientific community as to the "best" tool(s) for the job.As new tools are developed we will assess their applicability to the data from this project and utilize whichever tools enable the best biological insight.

Progress 06/01/20 to 05/31/24

Outputs
Target Audience:We expect that all the products of this research will enhance our understanding of the genetic/genomic mechanisms underlying the complex trait of feed efficiency and serve as a model for future work in related traits and thus will be of interest to nutritionists, physiologist and those performing genomic research. Furthermore, these data can be used by the FAANG Consortium to enhance the annotation of the bovine genome and provide a link between genetic variants and gene expression. Finally, variants likely to be causal will be shared with genotype service providers for inclusion on future genotyping assays thus providing an avenue for use by beef cattle producers. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Ph.D. student Stull presented several lectures during the spring 2024 semester teaching graduate students, postdocs and faculty in the Division of Animal Sciences transcriptome analysis methods. How have the results been disseminated to communities of interest?4/16/2024 Ph.D. student Stull presented poster 305 titled "Encouraging Reproducibility: A Bioinformatic Pipeline for Scaling Single-Cell/Single-Nuclei RNA-seq Analyses" at the 2024 AGBT-Ag meeting, Pheonix AZ. 5/10/2024 Ph.D. student Stull presented poster 247 titled "Encouraging Reproducibility: A Bioinformatic Pipeline for Scaling Single-Cell/Single-Nuclei RNA-seq Analyses" at the 2024 Biology of Genomes meeting, Cold Spring Harbor NY. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Objective 1) We generated snRNA-seq data for the hypothalamus for the same three high and three low-efficiency samples as the liver. A single nuclei processing pipeline was generated to annotate cell types and used to create cell-type transcriptome profiles for each annotated cell type. In liver, this included Hepatocytes, Endothelial cells, Kupfer cells, B cells, T cells, Stellate cells, and Cholangliocytes. In hypothalamus, Oligodendrocytes, Microglia cells, Oligodendrocyte precursors, Neurons, Astrocytes, Endothelial cells, Bergmann glial cells, and Ependymal cells. Deconvolution methods are being evaluated to obtain gene expression counts at cell-type resolution from bulk transcriptomes. Objective 2) Nothing to report Objective 3) Intron/exon counts were generated for molecular phenotype QTL analysis using the generated pipeline. For liver, 181 bulk transcriptomes were processed, producing molecular phenotype counts for 15,655 genes, 231,298 exons, 130,261 exon inclusion ratios, and 113,795 intron excision ratios. For hypothalamus, 102 bulk transcriptomes were processed, producing molecular phenotype counts for 17,257 genes, 284,838 exons, 135,649 exon inclusion ratios, and 117,329 intron excision ratios. For small intestine, 98 bulk transcriptomes were processed, producing molecular phenotype counts for 17,475 genes, 285,741 exons, 148,938 exon inclusion ratios, and 129,241 intron excision ratios. Objective 4) Genome-wide association analyses were performed with GCTA-MLMA for 5 feed efficiency-related phenotypes (average daily gain N=11,312, metabolic mid-weight N=11,312, dry matter intake N=11279, residual feed intake N=11279, and feed conversion ratio N=10585). using 791,443 genotypes with a MAF > 0.01. Location, season, year, feeding pen, treatment, sex, and days on feed were used as covariates. Variants meeting a FDR of 5%, were ADG=483, MMW=1,266, DMI=68, RFI=160, and FCR=154. Objective 5) Molecular phenotype QTL analysis was performed for liver (N=181), hypothalamus (N=102), and small intestine (N=98) using imputed sequence genotypes called from RNA-seq. The xQTL pipeline generated a GRM and calculated PEER factors for each molecular phenotype tissue cohort. Association testing in liver resulted in 2,457 genes (758,041 variants), 7,770 exons (1,113,149 variants), 5,379 exon inclusion ratios (1,351,053 variants), and 11,099 intron excision ratios (1,881,194 variants) with at least one variant significantly associated. For hypothalamus, 1,021 genes (195,671 variants), 6,791 exons (1,206,957 variants), 6,777 exon inclusion ratios (993,187 variants), and 8,956 intron excision ratios (1,290,895 variants) had at least one variant significantly associated. And for small intestine, 554 genes (121,188 variants), 2,785 exons (476,723 variants), 3,573 exon inclusion ratios (680,332 variants), and 7,770 intron excision ratios (1,113,149 variants) had at least one variant significantly associated. Objective 6) Methods for integrating xQTL results with GWAS results are ongoing. An integration pipeline is being generated to perform statistical fine-mapping using the SuSie algorithm implemented in the PolyFun package.

Publications

Progress 06/01/22 to 05/31/23

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Two graduate students have received training in bioinformatics methods for processing genomic data. How have the results been disseminated to communities of interest?10/19/2022 PD Schnabel presented a talk titled "Basic animal genetics and managing variants." to the American International Charolais Association Breed Improvement Committee meeting at the Kansas City Airport Marriot. 1/06/2023 PD Schnabel presented a talk titled "The basic biology and technology behind the genomic EPD" at the 55th Annual Missouri Cattle Industry Convention and Trade Show. What do you plan to do during the next reporting period to accomplish the goals?All analyses will be concluded in the final year of the no-cost extension.

Impacts
What was accomplished under these goals? 2) Variant calling from the generated transcriptomes was completed. 3) The full analysis pipeline for bulk RNA-seq was finalized and tested. This includes WASP filtering during sequence processing, calculation of the various molecular phenotypes (gene expression, exon expression, exon inclusion ration and intron exclusion ratio) and final analysis with GCTA-MLMA including the GRM and PEER factors as covariates. All of this has been automated for speed and reproducibility. 4) A final WGS phasing reference panel was developed that includes 6,137 genomes at approximately 211 million sites. The variants were phased using SHAPEIT5 after extensive optimization of parameters. RNA-seq and/or SNP-chip based genotypes from assayed samples will be imputed to this reference to use for eQTL analysis. 5) Nothing to report. 6) Nothing to report.

Publications

Progress 06/01/21 to 05/31/22

Outputs
Target Audience: Nothing Reported Changes/Problems:The main issue has revolved around the Covid pandemic and the ability to recruit a Ph.D. student to work on the project. What opportunities for training and professional development has the project provided?Two graduate students have received training in bioinformatics methods for processing genomic data. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?With the addition of a Ph.D. student devoted to the project we expect to complete the objectives in the final year.

Impacts
What was accomplished under these goals? 1) Transcriptome data generation was completed in 2021. We leveraged NRSP8 cattle coordinator funding to produce single-nuclei RNA-seq (snRNA-seq) data for liver from three high efficiency and three low efficiency samples. 2) All transcriptome data from the Short Read Archive (SRA) was downloaded to augment the eQTL and variant calling (N>8000). 3) A Ph.D. student devoted to the project started in August 2021. He has evaluated using WASP filtering within the STAR alignment software and is building the pipeline to perform transcriptome counts and alternative splicing. 4) Progress was made in developing the pipeline for imputation to genome sequence. Variant calls have been made using over 5000 genomes to produce the haplotype reference for WGS imputation. The SNP chip imputation reference panel was improved by additional quality control to exclude misplaced markers and correct some marker locations. 5) Nothing to report 6) Nothing to report

Publications

Type: Journal Articles Status: Accepted Year Published: 2022 Citation: Qanbari S, Schnabel RD and Wittenburg D. Evidence of Rare Misassemblies in the Bovine Reference Genome Revealed by Population Genetic Metrics. Animal Genetics. 2022;0 0:1 8. https://doi.org/10.1111/age.13205

Progress 06/01/20 to 05/31/21

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Most conferences were either moved to online or cancelled due to COVID19. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?Objectives 1-4 are scheduled to be completed in year 2 of the project.

Impacts
What was accomplished under these goals? All of this work was significantly delayed due to the COVID19 pandemic initially due to campus closure/restrictions and later due to reagent acuisition delays. 1) Transcriptome profile 107 hypothalamus, 107 small intestine and 150 liver tissues from animals selected to have extreme residual feed intake phenotypes. We identified all of the samples needed for this project and they were retrieved from our freezer system. Testing of RNA extraction procedures was done in 2020. Production RNA extractions were initiated in January 2021 with the first 100 samples completed data generation on 4/14/2021. The remaining 250 RNA extractions were completed in May 2021 and submitted for data generation. 4) Impute SNP chip genotypes on 11,000 animals with measurements of feed efficiency to 850k and whole genome sequence to enable GWAS for feed efficiency related traits. Significant progress was made on building an imputation pipeline to efficiently impute SNP-chip genotypes. We began exploring different approaches to increase imputation accuracy both at SNP-chip and sequence level imputation.

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: Bickhart DM, McClure JC, Schnabel RD, Rosen BD, Medrano JF, Smith TPL. Advances in sequencing technology herald a new frontier in cattle genomics and genome-enabled selection. 2020. Journal of Dairy Science 103:6, 5278-5290. https://doi.org/10.3168/jds.2019-17693
Type: Journal Articles Status: Published Year Published: 2020 Citation: Triant DA, Le Tourneau JJ, Unni DR, Diesh CM, Shamimuzzaman M, Walsh AT, Gardiner J, Goldkamp A, Li Y, Nguyen H, Roberts C, Zhao Z, Alexander LJ, Decker JE, Schnabel RD, Schroeder SG, Sonstegard TS, Taylor JF, Rivera RM, Hagen DE, Elsik CG. Using Online Tools at the Bovine Genome Database to Manually Annotate Genes in the New Reference Genome. Anim Genet. 2020;51(5):675-682. https://doi.org/10.1111/age.12962
Type: Journal Articles Status: Published Year Published: 2020 Citation: Silva DBS, Fonseca LFS, Pinheiro DG, Magalh�es AFB, Muniz MMM, Ferro JA, Baldi F, Chardulo LAL, Schnabel RD, Taylor JF, Albuquerque LG. Spliced genes in muscle from Nelore Cattle and their association with carcass and meat quality. 2020. Sci Rep 10, 14701.
Type: Journal Articles Status: Published Year Published: 2021 Citation: Wang X, Ju Z, Jiang Q, Zhong J, Liu C, Wang J, Hoff JL, Schnabel RD, Zhao H, Gao Y, Liu W, Wang L, Gao Y, Yang C, Hou M, Huang N, Regitano LCA, Porto-Neto LR, Decker JE, Taylor JF, Huang J. Introgression, admixture, and selection facilitate genetic adaptation to high-altitude environments in cattle. 2021. Genomics. https://doi.org/10.1016/j.ygeno.2021.03.023