Composition of the GI microbiota and predisposition to enterohemorrhagic E. coli colonization as complex polygenic traits in beef cattle

COMPOSITION OF THE GI MICROBIOTA AND PREDISPOSITION TO ENTEROHEMORRHAGIC E. COLI COLONIZATION AS COMPLEX POLYGENIC TRAITS IN BEEF CATTLE

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

0224446

Grant No.

2011-67005-30060

Cumulative Award Amt.

$2,354,004.00

Proposal No.

2010-04449

Multistate No.

(N/A)

Project Start Date

Feb 15, 2011

Project End Date

Feb 14, 2017

Grant Year

2011

Program Code

[A4111]- Food Safety: Microbial Ecology and Shiga toxin-producing Escherichia coli (STEC) Shedding in Cattle

Recipient Organization
UNIVERSITY OF NEBRASKA
(N/A)
LINCOLN,NE 68583

Performing Department
Food Science & Technology

Non Technical Summary
Shiga-toxigenic Escherichia coli (STEC) comprise genetically diverse populations of E. coli that share the common characteristic of producing Shiga toxins. STEC, including the most common EHEC lineage in North America, the O157:H7 lineage, are routinely found in healthy cattle, sheep, goats and swine. They rarely cause disease in adult animals, who serve instead as reservoirs for zoonotic transmission to humans, often through contaminated foods. Technologies derived from epidemiologic risk factor modification, intervention field trials and STEC/EHEC molecular biology have shown only non-repeatable or weak effects on this organism's prevalence when evaluated against naturally EHEC O157-infected livestock under field conditions. Thus, livestock producers currently have no science-based recommendations available for on-farm STEC and EHEC prevalence reduction. It is our long-term goal to mitigate public health risk posed by EHEC and STEC in cattle by minimizing the number of animals who become shedders of these organisms. Because EHECs exist within complex microbial communities that colonize the bovine GI tract, we will investigate these communities in association with high- or low- levels of EHEC shedding ("shedder" phenotype), on one hand, and the animal's genotype on the other hand. We will first identify correlations between specific features of microbial community composition or functionality and "shedder" phenotype. We will then analyze the animals' genome for regions and candidate genes that exert influence over the patterns of microbial colonization in the gut. Aside from the insight into the biological underpinnings of EHEC shedding, our studies promise to discover genes and markers associated with supershedders and thus equip animal breeders with a new tool for minimizing their number.

Animal Health Component

(N/A)

Research Effort Categories

Basic

100%

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
304	3310	1080	40%
304	3310	1103	40%
712	3310	1080	10%
712	3310	1103	10%

Knowledge Area
712 - Protect Food from Contamination by Pathogenic Microorganisms, Parasites, and Naturally Occurring Toxins; 304 - Animal Genome;

Subject Of Investigation
3310 - Beef cattle, live animal;

Field Of Science
1103 - Other microbiology; 1080 - Genetics;

Keywords

high-throughput sequencing

metagenomics

quantitative genetics

Goals / Objectives
Studies of EHEC shedding have shown that environmental factors alone do not explain variation in levels of EHEC shedding within a population. Evidence suggests that composition of the GI microbiota can be viewed as a complex trait whose expression depends on a combination of environmental and host genetic factors. Without knowing the host contribution, it is impossible to predict an individual's predisposition to being colonized by EHEC. We hypothesize that species composition of the bovine GI microbiota (including EHEC) is a complex, polygenic trait and that GWAS strategies can be used to identify genomic loci that influence microbiome composition and EHEC shedding. We will: 1. Phenotype a large animal population (n=1,400) at three sites (MARC, CSU, Texas A&M) for EHEC shedding, microbiome composition and feed intake etc; assess microbiome composition with 16s rRNA pyrosequencing. 2. Collect genotypic data with a large bovine SNP panel and perform QTL analysis. Analyze genotypic and phenotypic data using linkage disequilibrium QTL mapping strategies for GWAS. 3. Further localize QTL using a combination of eQTL and localized high-density mapping with Bovine Oligonucleotide microarrays and dense SNPs within the QTL regions. Our innovative approach allows us to identify the host genomic loci (QTL) which control microbiome composition and EHEC shedding and measure the relative strength of each QTL's signal. Aside from the insight into the biological underpinnings of EHEC shedding, our studies promise to discover genes and markers associated with supershedders and thus equip animal breeders with a new tool for minimizing their number.

Project Methods
First, we will employ a combination of phenotyping methods to define the associations of EHEC shedding with compositional features of the colonic microbiota, asking if particular features of the microbiota are associated with predisposition to shedding. Animals (n=1400)will be phenotyped for gut microbiota composition and E. Coli O1457:H7 presence. Gut microbiota composition will be assessed by 454-based sequencing on 16s amplicons derived from fecal microbial DNA. Presence of STECS will be determined by culturing methods developed at USMARC. Further 700 samples will undergo metagenomic analysis by sequencing on the 454 or PacBio platform to identify functional content of the gut microbiota. Second, relative abundances of EHEC, other individual colonic taxa, and/or features of the microbiota associated with shedding (or resistance to shedding) will be tested for genome-wide association with dense panels of markers using quantitative trait locus (QTL) analysis. All 1400 animals will be genotyped using the Illumina Bovine SNP50 Bead chip. Finally, we will begin to assess the underlying pathways associated with super-shedder QTL using expression QTL (eQTL) analysis. eQTL studies will be performed on tissue from the rectal-anal junction, removed from animals at slaughter. The outcomes of these experiments will be: 1-a data set for analyzing interactions between EHEC and members of the bovine colonic microbiota. Sequence information deposited in this dataset will be QC'ed according to parameters established at CAGE and tested with rigorous statistical analyses. 2-determination whether host genetics plays a role in colonization by EHEC and development of super-shedder phenotype. This information will be immediately relevant to breeders, and will be accompanied by data about associated markers and candidate genes.

Progress 02/15/11 to 02/14/17

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project trained one postdoctoral fellow (Dr. Rohita Sinha) who was recently hired onto the UNL faculty in the Department of Food Science and Technology as a Research Assistant Professor. In addition, the highly interdisciplinary nature of the project provided substantial professional development opportunities for all key project personnel. The quantitative geneticists (Dr. Larry Keuhn, Dr. Warren Snelling, Dr. Stephen Kachman) gained tremendous experience in working with the microbiome, and in particular learning how to view the microbiome as a hierarchical set of (very) complex traits. Similarly, the microbiologists (Dr. Andrew Benson, Dr. Jim Bono, Dr. Jim Wells) gained substantial experience in learning to measure the effect of individual (host) genetics on different features of the microbiome. How have the results been disseminated to communities of interest?The individual genotype data and metagenome data will be deposited in the NCBI Sequence Read Archive (SRA) in December, 2017. Results to this point have been disseminated in four different publications and presentations at large conferences (Meat Science and Muscle Biology Symposium, American Society for Microbiology, International Association for Food Protection). At least one additional publication is planned that will describe results from the final GWAS and MWAS analyses. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? To adequately power the study, animals from the MARC herd were sampled over a five-year period in cohorts of 250-350 different individuals each year. The cohorts included heifers and steers that are admixtures of the 16 most common beef cattle breeds in the US. All animals were genotyped to a density of either 55,000 SNPs (1,097 animals) or 770,000 SNPs (400 animals--200 pairs of sire +1 progeny) with haplotypes from the lower-density genotypes being imputed from HD 770,000 data. To measure EHEC shedding and compositional features of the colonic microbiome, each animal was sampled 7 times rectally during the high-shedding season (July-September). E. coli O157:H7 was enumerated at each time point by culture-based methods and qPCR. For microbiome analyses, equal portions (by weight) of the colonic samples from each of the seven timepoints were composited for each animal. Total microbial DNA from the composites was subsequently used for full shotgun metagenome sequencing. High-quality genotype and metagenomic data was obtained from a total of 1,327 animals. Preliminary Genome Wide Association Studies (GWAS) and Microbiome Wide Association Studies (MWAS) were subsequently performed on the initial data set of 1,047 from 4 cohorts of animals. GWAS from the full data set (all 1,327 animals) is pending. The preliminary GWAS defined 24 significant QTLs that affect STEC O157:H7 shedding traits as well as certain taxa of the colonic microbiome. Final GWAS are being tested using two approaches: first conducting the analysis on the full data set from 1,327 animals and secondly, using the preliminary subset of 1,047 animals as a discovery population and the remaining 280 animals as a validation population. Detailed Progress Metagenome sequencing. Microbiome phenotyping by shotgun metagenome sequencing on a Hi-Seq 2500 using 2 X 100 paired-end sequencing. The animals in each pool were randomized across sample year, gender, and diet to minimize run effects. The assembly-free algorithm in the Kaiju package was used to measure the taxonomic content and relative abundances of each taxon across all of the animals. Kaiju identified 1,324 taxa that were shared across a minimum of 75% of the animals, and the relative abundances of these taxa were subsequently used for MWAS and GWAS analyses. Preliminary Microbiome-Wide Association Studies (MWAS) Preliminary MWAS analyses were done to identify data structures (taxonomic features or functional features) within the metagenomic data that are associated with STEC O157:H7 shedding. STEC O157:H7 shedding (which was measured at 7 time points per sample) showed substantial variation within animal, and is best characterized as sparse. Given the non-normal distribution, MWAS was best suited for non-parametric approaches and machine-learning. These approaches defined significant positive or negative associations with a small number of taxa, but the effect sizes were relatively small and the significant taxa were low-abundance, comprising less than 0.05% of the total microbiome across most animals. Machine-learning approaches also defined small sets of taxa having predictive capacity in training data sets, but similarly had relatively low predictive power (correct 70% of the time) and likewise were driven by low-abundance taxa. Thus, it appears that primary taxonomic configuration of the colonic microbiome (dominant taxa) has no significant impact on shedding of STECO157:H7. A small number of low-abundance taxa may be associated but low-abundance taxa (even when present in most animals of the study) are statistically much less reliable due to sampling error. Pending statistical analyses will further determine how many of those may be false discoveries due to sampling. In multiple studies of host-microbiome interactions, functional features of the microbiome generally show stronger association and effect sizes on host characteristics. However, when we used functional traits of the microbiome for MWAS studies with STEC O157:H7 shedding, we found even less statistical support for association with shedding than was observed with taxa. Thus, we conclude that the dominant functional configuration of the colonic microbiome also has little impact on shedding characteristics. Preliminary Genome-Wide Association Studies (GWAS) To determine if genetic variation in the host contributes to STECO157:H7 shedding characteristics and/or composition of the colonic microbiome, the STECO157:H7 shedding data and the taxonomic abundances from corresponding metagenomic data were used as "traits" in genetic analyses. A combination of heritability estimates and GWAS was used to assess overall genetic effects and the effects of genetic variation at individual loci. Data from the initial 1,047 animals was used as a "discovery" set in preliminary analyses. Final analysis of the 1,327 animals is pending. STEC O157:H7 shedding traits and taxonomic composition of colonic microbiome are heritable. Genomic REML heritability estimates were performed using genomic relationships from imputed BovineHD SNP genotypes. Sex, year-season, breed composition and heterozygosity were included in the model as fixed effects and univariate analysis was used for each shedding trait and taxonomic trait. Of the 87 taxa and 6 shedding traits that were analyzed, 14 taxa and 2 of the shedding traits showed heritability values >0.1. The highest heritability was observed for Butyrivibrio (h=0.21) and for the STECO157:H7 shedding trait "total animal prevalence" (h=0.10). Thus, taxonomic abundances of the colonic microbiome and STECO157:H7 shedding appear to be lowly heritable traits--measurable in statistically-powered experiments but with low effect-size. Variation in STEC O157:H7 shedding traits and taxonomic composition of the colonic microbiome is associated with 24 different QTLs. Genome-wide association studies (GWAS) was performed by measuring effects of individual SNPs across the genome for each of the shedding and taxonomic traits. The SNP effects were measured by solving genomic BLUP for the EBVs (estimated breeding values). QTL regions were defined as 2Mb regions surrounding SNPs showing significant association with shedding and/or taxonomic traits (the 2Mb width based on spacing of bovine HD variants in LD with significant SNPs). From this analysis, 24 different QTLs were identified across the genome that affect shedding levels of STEC O157:H7. Of these QTLs, > half of them are pleiotropic, affecting both shedding traits and one or more taxa of the microbiome. QTLs for STEC shedding are enriched for keratin and cytoskeletal functions. Using 2Mb windows of SNPs in LD with SNPs associated with STEC O157:H7 shedding traits, the gene functions in these regions (based on GO terms) were tested for enrichment (overrepresentation) of functions. Of the 330 genes that were within the 2Mb windows of the 24 QTL regions, 97 of these genes have functions associated with keratin and cytoskeletal filament biosynthesis, suggesting that genetic variation in cytoskeletal components could contribute to shedding characteristics of individual animals. Given that attaching and effacing lesion formation by STEC O157:H7 has substantial effects on the cytoskeleton (e.g. pedestal formation), and that several of the secreted effectors interact directly with cytoskeletal components, it is remarkable that we can detect the influence of host genetic variation in these same components.

Publications

Type: Journal Articles Status: Accepted Year Published: 2017 Citation: M. Kim, L. A. Kuehn, J. L. Bono, E. D. Berry, N. Kalchayanand, H. C. Freetly, A. K. Benson and J. E. Wells. 2017. Journal of Applied Microbiology. Accepted manuscript online: 23 JUL 2017 11:20PM EST | DOI: 10.1111/jam.13545

Progress 02/15/12 to 02/14/13

Outputs
Target Audience: The project at this point is primarily relevant to the scientific community, and is most relevant to those interested in food safety and zoonotic diseases. Changes/Problems: Obviously, the originally-proposed 16S amplicon-based phenotyping of the gut microbiome on the Roche-454 platform was no longer a viable option and created a major methodological roadblock. As described in the Scientific Progress/Discovery section above, we have successfully overcome this potentially crippling circumstance (brought on solely by Roche) by shifting sequencing strategy and developing the necessary computational infrastructure to capitalize on high-throughput metagenome sequencing that is now affordable on the Illumina Hi-Seq platform. What opportunities for training and professional development has the project provided? Bioinformatics: Three different postdoctoral students are working on this project in collaboration with the team of scientists. Through our bi-weekly meetings, the postdocs have an audience with the entire team of interdisciplinary investigators. To this point, the team has now developed and validated two different metagenomics data processing pipelines (one for microbial taxa and the other for function). Statistical Analysis: In addition to robust training in development of bioinformatics pipelines, the postdocs are also gaining experience with biostatistics and analysis of complex, high-dimensional data. How have the results been disseminated to communities of interest? Invited presentations where portions of this project have been discussed in public: "Agricultural microbiomes and the global good", Gates Global Good LLC Meeting, Bill and Melinda Gates Foundation, Seattle, WA, July 9, 2012 (Brainstorming conference with Gates Foundation Scientific Advisory Group) "Host genetic control of the gut microbiome by modulation of microbes with keystone-like characteristics", Gordon Research Conference on Quantitative Genetics, Galveston, Tx, February 20, 2013. (Invited, International Conference) "Do we raise cattle or bugs? Exploring the symbiotic relationship between cattle and their microbes", Beef Improvement Federation, Oklahoma City, OK, June 14, 2013. (Plenary speaker, National Conference, Stakeholders) "Modeling the convergence of deterministic factors that shape the gut microbiome through quantitative genomics", 4th International Human Microbiome Congress, Hanzhou, China, September 14, 2013. (Plenary speaker, International Conference). " Host genetics and modulation of the gut microbiome as a means to understanding metabolic function", Symposium on Population-based animal models for discovery of complex traits, American Society for Human Genetics 2013, Boston, MA, Oct. 23, 2013. (Plenary speaker, International Conference). "Assembly of the gut microbiome is shaped by convergence of genetic and environmental factors" Michigan State University, Microbiology and Molecular Genetics Seminar, Dec. 3, 2013, (Invited seminar speaker) "Prospects for diagnosis by sequencing", Bovine Respiratory Disease Symposium 2014, Denver, CO, July 31, 2014 (Plenary Speaker, National Conference) "Guts, germs, and glutamate: an emerging picture of host-driven enrichments for gut microbiome composition and function", Invited Lecture, National Animal Disease Center, Ames, IA, August 11, 2014 What do you plan to do during the next reporting period to accomplish the goals? Data analysis will be the main emphasis for next reporting period. Emphasis will lie in further modeling of the STEC shedding as "traits" and further discovery of how these traits are associated with the microbiome and host genetics. We will also generate the expression measurements of the Rectal-Anal Junction tissue and use these measurements to augment the association studies as e-QTL data.

Impacts
What was accomplished under these goals? Animal population and genotyping: Samples to date have been collected from >1,200 animals. Genotyping data from 1,197 animals has been completed with 897 animals genotyped on the bovine 50K chip and 150 pairs of sire + 1 progeny being genotyped on the 770HD (300 animals total). These HD animals will serve as models for imputing 50K data onto to improve the precision genotype prediction across the entire population. Microbiome Phenotyping: Microbiome phenotyping by shotgun metagenome sequencing is complete for 954 animals. Taxonomic abundances were estimated from a reference assembly of metagenomic contigs by de novo assembly from each sample. This yielded 87 genera with >10Kb of pooled contig length that were found across at least 75% of the animals. To quantify each of these genera, reads for each animal were mapped onto the reference set and quantified. Functional abundances were estimated by a new algorithm we developed that uses frequency-weighting to assign function based on KEGG/E.C. ontology. Using this algorithm, we are now able to make functional assignments to >25% of the reads, which is far better than the 10-15% we were capturing with other pipelines. Preliminary Microbiome-Wide Association Studies (MWAS) with STEC shedding Preliminary MWAS analyses were done using simple correlation and linear regression to determine correlation structures within the microbiome data and to identify linear relationships between the taxonomic or functional content of the colonic microbiome and the level of STEC shedding. STEC shedding (which was measured at 7 time points per sample). STEC shedding is not linearly related to any individual taxon Cluster analysis grouped animals based on shedding patterns into two broad groups; animals that shed >8,000 CFU/g at any single time point (shedders) and animals that did not shed at this level at any time point (non-shedders). Based on these two groups, we then implemented a systematic search for association of STEC shedding pattern and microbiome composition. At this point, only weak associations have been observed between STEC shedding and any single taxon or group of taxa. Given the significant amount of variation in the microbiome that is observed in this data set, we are currently refining this analysis by removing taxa, functions, and animals that have low values (essentially absent) for a given taxonomic or functional feature and then rerunning the analysis for the individualized data sets. Non-linear relationships between STEC shedding and microbiome composition can be discovered through machine-learning algorithms Using the same categorical classification of animals into the two groups (shedders=animals that shed >8000 CFU/g at any single time point) and (non-shedders= <8,000 CFU/g at all time points), Wilcoxon rank sum tests were used within the shedder and non-shedder sample groups to rank the taxa for each sample based on relative abundances. The U statistic was then calculated from pair-wise combinations of all samples in each group based on the number of samples where a given taxon shares a similar rank. Under these criteria, 42 taxa were discriminatory for the shedder and non-shedder categorization. Data from all 42 of these taxa across all samples were next used to train a set of classifiers within the WEKA package (http://www.cs.waikato.ac.nz/ml/weka/). The trained classifiers were used on a subset of 90% of the animals and then tested by a jackknife strategy against random pools of 90% of the animals. Overall, each of the trained classifiers were >75% correct in categorizing STEC shedding profile based solely on microbiome composition. The Random Forest Classifier performed the best, with an accuracy rate >87%. Training and testing of these same classifiers using the functional data is now in progress.

Publications

Progress 02/15/11 to 02/14/12

Outputs
OUTPUTS: In order to capitalize on the higher capacity DNA sequencing platforms and avoid continued issues with the Roche 454 platform and reagent streams, a metagenomic sequencing strategy was developed for phenotyping the colonic microbiome of the bovine population. The strategy is based on deep metagenome sequencing of a subset of samples, followed by assembly of genomic segments from the abundant organisms in each sample. Genomic segments are finally grouped by their compositional features, their phylogenetic relationships to each other and genomes of known organisms in order to assign taxonomic status. Contigs from taxa that are shared across the subset of samples are then used as a reference database and sequence reads from the remaining large number of samples are mapped onto this reference to quantify this core set of taxa across all individual samples. The entire metagenomic assembly and mapping pipeline has now been completed and validated using a pre-existing human data set (from the MetaHit Consortium) and a newly generated murine data set from 300 animals that are part of an NIH-funded study. PARTICIPANTS: Andrew Benson, Project Director Rohita Sinah, Postdoctoral Researcher, Programming of the new metagenome assembly-mapping pipeline Jim Wells, USDA-MARC, bovine sampling, STEC microbiology Jim Bono, USDA-MARC, bovine sampling, STEC microbiology Larry Keuhn, USDA-MARC, animal population, quantitative genetics, genotyping TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: As described in the outcomes, modifications in sequencing strategy, sequencing platform, and bioinformatic data processing were developed to circumvent performance issues with the Roche 454 sequencing platform. While these changes in methodology were significant, they still allow us to measure the same type of phenotypic characteristics of the microbiome as originally proposed. Namely, we can still quantify the abundant taxa in the colonic microbiome as well as their functional characteristics. Therefore, this represents only a change in analytical platform and it does not significantly affect the overall project aims or objectives.

Impacts
The major impact of this year's progress was to overcome a significant limitation posed by the demise of the Roche 454 sequencing platform. The original study design relied on this platform and even had built-in plans for circumventing modifications to the sequencing platform. However, the changes to the platform made in 2012 by Roche were so significant that the the platform no longer was able to support microbiome analysis by 16S amplicon sequencing as it had for the Titanium platform. Our new sequencing strategy and bioinformatics platforms circumvent this whole problem, allowing us to now phenotype all animals using the high-capacity Illumina Hi-Seq platform and still be able to meet objectives of the proposed studies on time and within budget.

Publications

No publications reported this period