Source: COLORADO STATE UNIVERSITY submitted to
IMPROVING OUR UNDERSTANDING OF THE ECOLOGY OF ANTIMICROBIAL RESISTANCE IN FOOD PRODUCTION USING BAYESIAN MODEL AND MACHINE LEARNING APPROACH
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
1008469
Grant No.
2016-67012-24679
Project No.
COLV2015-03538
Proposal No.
2015-03538
Multistate No.
(N/A)
Program Code
A7201
Project Start Date
Jan 15, 2016
Project End Date
Jan 14, 2017
Grant Year
2016
Project Director
Noyes, N.
Recipient Organization
COLORADO STATE UNIVERSITY
(N/A)
FORT COLLINS,CO 80523
Performing Department
Clinical Sciences
Non Technical Summary
Antimicrobial resistance (AMR) is a pressing public health concern with ramifications for food production, particularly meat and poultry. Our group has recently adopted a next-generation sequencing, metagenomics-based approach to researching AMR in livestock production. This sequencing technology allows us to access all of the bacterial DNA within a given sample, thus enabling investigation of the complex microbial ecosystem in which AMR exists. However, in order to extract meaningful patterns within such sequence data, advanced statistical methods must be applied. The goal of this proposal is to use advanced statistical methods such as hierarchical Bayesian modeling and machine learning to uncover patterns of association between livestock production strategies (such as antimicrobial use practices) and AMR. To this end, we will identify, optimize, validate and apply existing Bayesian and machine learning tools to three metagenomic datasets generated from studies of AMR in livestock production. Outcomes of these activities include provision of open-source statistical analysis methodologies for use by other agriculture scientists grappling with complex research data; as well as identification of important and actionable drivers of AMR in livestock production systems. These outcomes directly fulfill the program area of food safety by providing evidence-based results that can be used to formulate effective AMR mitigation interventions and policies.
Animal Health Component
20%
Research Effort Categories
Basic
45%
Applied
10%
Developmental
45%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
71139102080100%
Goals / Objectives
Antimicrobial resistance (AMR) is a critical public health issue. Infections with resistant pathogens are estimated to cause an additional 8 million hospitalization days, and methicillin-resistant Staphylococcus aureus (MRSA) infections alone caused 9,650 deaths in the US in 2011. Meat production systems are thought to contribute to the problem by harboring a reservoir of AMR that interfaces with humans either through persistence in the food chain or dissemination of wastes into the environment. Antimicrobial use in food producing animals is often cited as a driver of AMR in food production, but this blanket statement fails to recognize the extremely varied contexts in which such use occurs.Unfortunately, scientific research has been largely unable to provide consistent guidance as to which specific use practices require modification in order to protect public health. Our group has been researching antimicrobial use in food animals for over a decade. We have typically determined resistance status based on either the failure of indicator organisms to grow in antimicrobial-impregnated cultures, or on the presence of PCR-amplified fragments in sample DNA. Using these methods, we have conducted many large, well-designed prospective cohort studies of antimicrobial use and resistance in commercial settings. Despite the meticulous thought that went into these studies, we continue to grapple with anomalous and often contradictory results. One study involving feedlot cattle found weak positive associations between tetracycline exposures and resistance, while another showed increasing prevalence of tetracycline resistance in the absence of any tetracycline use. These are but a few examples from a large body of literature that can best be described as consistently inconsistent. Within this context, it is unsurprising that policy makers and producers alike are hard-pressed to formulate evidence-based policies for AMR mitigation.Given our experience in this area, we came to believe that culture- and PCR-based methods were inadequate for identifying and understanding use-resistance patterns that occur within a complex ecosystem. For example, cattle feces contain thousands of bacterial taxa. In this context, studies that generalize AMR findings gleaned from one or two indicator bacteria commit ecological fallacy. PCR-based methods may circumvent this problem, but typically focus on one or two antimicrobial drug classes, while many cross-class resistance determinants travel frequently and readily between distantly related bacteria. Given these shortcomings, we have adopted a shotgun metagenomics approach as a potential means of providing more consistent and actionable answers to the question of AMR in food production. Through two studies, we have identified over 350 unique AMR genes in cattle production systems, a degree of diversity that is impossible to identify using culture- and PCR-based approaches. We have found that different biomes - but not different cattle production systems - harbor distinct AMR profiles. Preliminary evidence also suggests that currently utilized harvest interventions not only reduce pathogen load, but may also serve to mitigate transmittance of AMR into retail beef. Further, our data show a decrease in AMR diversity throughout cattle production, perhaps indicating selective pressure on bacterial populations.These findings are immensely intriguing, yet barely scratch the surface of what we could find using more advanced statistical and computational methods. Unfortunately, these methods have only just begun to find their way into microbiome research, and have not been applied to metagenomics projects in agriculture. Specifically, there is an urgent need to apply advanced statistical and computational methods towards finding complex associations between metadata factors (such as management practices and antimicrobial use) and patterns of AMR in metagenomic sequence data - as well as an urgent need for people who can appropriately use these methods. The goal of this project is to understand drivers of AMR by uncovering complex patterns of interactions between multiple ecosystems operating at different levels - from the genes, to the bacteria, to the host, to the environment, and to the management practices utilized in food production. We propose to adapt existing Bayesian modeling and machine learning methods to achieve this goal. At this point, we have assembled all of the necessary ingredients to achieve this - metagenomic and related datasets, a validated bioninformatics pipeline, sufficient computational capacity, existing statistical methods, and individuals with expertise in how to apply them. Now begins the hard work of identifying the most appropriate and useful statistical methods for our data, and modifying them to the specific nature of metagenomic data and the specific question of AMR in food production. Therefore, the specific objectives are:Objective #1 - identify a set of tools that can be used to execute hierarchical Bayesian modeling and machine learning methods on metagenomic datasets.Objective #2 - test, optimize and validate the tools identified in Objective #1 using our own metagenomic datasets.Objective #3 - apply optimized tools to new and existing metagenomic data to uncover novel associations between genes, microbes, host, environment and management practices that influence AMR dynamics in food production systems.
Project Methods
Bayesian ModelingThe datasets generated from comprehensive metagenomic-based research projects are inherently hierarchical, with multiple levels of complexity that need to be modeled together. Hierarchical Bayesian models assume such a hierarchical structure, and therefore can take into account all possible within- and between-level interactions (28-30). In addition, their ability to directly incorporate overdispersion and multi-source variation into models makes Bayesian models particularly well-suited to metagenomic data, which typically contain both. Hierarchical Bayes is also extremely flexible, as it can be used for everything from analysis of variance designs (ANOVAs) to generalized linear mixed effects models and state-space time-series analyses (31). Indeed, Dr. Abdo recently performed the latter analysis on a large vaginal microbiome dataset (16).To implement hierarchical Bayesian modeling, we will first standardize the data for use with open source Bayes software packages, including jags (just another Gibbs sampler) (32), OpenBUGS (33), WinBUGS (34) and the R package "MCMCpack" (35). A more systematic and comprehensive list of available software will be undertaken as part of Objective #1. To facilitate Objective #2, model specification will be customized to each dataset and research question, and will be performed iteratively. Appropriate parameterization for each sampler will be tested and validated using convergence diagnostics as implemented in the "coda" package in R (36) (Objective #2). Finally, for each research question, an optimal method, model and parameterization will be decided upon, and model outputs will be transferred into R data format so that sophisticated graphical software can be utilized to interpret and display model results (Objective #3). Outcomes from this part of the research approach will be: streamlined, comprehensive and validated hierarchical Bayesian modeling approaches for investigating AMR in metagenomic studies; and identification of patterns of AMR associated with specific metadata and contextualized within multi-level ecosystems.Machine LearningMachine learning methods are another method for describing stochastic systems and for linking multi-level data to specific outcomes of interest (e.g. AMR). In this project, we will focus on supervised (vs. unsupervised) machine learning methods, which are better suited to inferential analyses and have been applied to microbiome studies (37). Importantly, these methods account for interactions between members or genes in a microbial ecosystem (38,39). To fulfill Objective #1, we will investigate use of several more widely used supervised methods, focusing on those that have been found to perform better on metagenomic data (40,41): support vector machines, random forests, and Bayesian networks at a minimum. These approaches will be optimized and validated (Objective #2) through use of nested cross-validation and performance metrics including proportion of correct classification, relative classifier information (42), and ratio of baseline to observed classification error. Finally, optimized method(s) and parameters will be applied to our metagenomic and metadata datasets (Objective #3). Outcomes will include optimized machine learning analysis pipelines that have been better-adapted to the unique nature and large size of metagenomic data; validated parameterization for these methods; and state-of-the-art representation of complex patterns of interaction between AMR, microbial, host, environmental and management ecosystems.

Progress 01/15/16 to 01/14/17

Outputs
Target Audience:The following target audiences were reached during this project: Other scientists, particularly those involved in: antibiotic resistance research livestock production use of shotgun metagenomic sequencing for microbiome-resistome research development of computer and statistical algorithms for analysis of shotgun metagenomic data Livestock producers, including: feedlot operators slaughterhouse owners cow-calf producers Students, including: undergraduate and graduate computer science students who are helping to implement computer algorithms for purposes of microbiome-resistome analysis graduate students involved in microbiome-resistome research in livestock production veterinary students involved in livestock production and population health Other USDA NIFA pre- and post-doctoral fellows Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Thanks to this project, I was able to undertake the following training and professional development opportunities: Data Visualization for High-Dimensional Data, led by data visualization experts from the University of Denver Center for Data Visualization and Statistics. July 11th, 2016 in Fort Collins, Colorado, ½-day of theory and hands-on training. I organized and funded this workshop as part of my post-doctoral fellowship, and offered it to colleagues and collaborators. Mining Microbial Genomes and Metagenomes for Biotechnological Applications, led by staff from the Joint Genome Institute (Department of Energy) during ASM/Microbe Conference, June 18th, 2016, Boston. Hands-on instruction on use of the JGI IGM suite of online tools and databases. Genome-wide Association Studies and Comparative Genomics for Tracking Multi-resistant and Hypervirulent Bacterial Clones, taught by multiple academic experts in bacterial population genetics during the ASM/Microbe Conference, June 19th, 2016, Boston. Comprehensive theoretical overview and hands-on practice with population genetic analysis of microbial populations. Introductory Network Analysis, led by Drs. Zvonimir Poljak and Olaf Berke at the Department of Population Medicine at Ontario Veterinary College, May 18th - 20th, 2016. Intensive 5-day instruction on use of network analysis in epidemiological investigations.'' In addition, I was able to improve upon my science communication skills through the following opportunities: Podcast interview, eLife. Topic: Publication of article "Resistome diversity in cattle and the environment decreases during cattle production." Available at: https://elifesciences.org/podcast/episode28 Written interview, MedicalResearch.com Topic: Publication of article "Resistome diversity in cattle and the environment decreases during cattle production." Available at: http://medicalresearch.com/author-interviews/study-finds-no-antibiotic-resistant-genes-in-meat-products-shipped-to-groceries/22592/ Blog interview, Dr. Richard Raymond's Feedstuffs.com blog. Topic: "Shotgun metagenomics: Not just a pretty face." Available at: http://feedstuffsfoodlink.com/blogs-shotgun-metagenomics-pretty-face-commentary-10799. Radio interview, KNEB radio. Topic: Antimicrobial resistance in livestock production. Two, 3-minute segments produced. Available as mp3 files upon request. Finally, I was able to complete a Graduate Teaching Certificate program, which included 20 hours of hands-on teaching experience, 12 credits in teaching and pedagogy, and completion of a Teaching Portfolio (available on request). Short courses/seminars completed: Best Practices for Online Course Design (3-week online course) Threading Information Literacy Throughout Course Curricula (4-week hybrid course) An Introduction to Audio/Visual Methods for Student Learning (4 hours of in-class lecture) More than Passive Listeners: Peer Instruction in the Lecture Setting (2-hour seminar) Overcoming barriers to learning in large STEM classrooms (2-hour seminar) Humanizing the Classroom (2-hour seminar) Using Canvas Quiz Statistics to Create Stronger Exams (1-hour seminar) What is Learning Analytics: An Introduction for Everyone (1-hour seminar) Survey Feedback in Online Courses: How do we get it? How do we use it? (1-hour seminar) Reflections on an instrument to provide quantitative feedback on teaching (1-hour seminar) How have the results been disseminated to communities of interest?Results have been disseminated through the following mediums: Journal articles Conference presentations Invited speaker presentations (including extension talks, conference keynotes, and expert panels) Media interviews (podcasts, blog posts) Press coverage Seminar presentations What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Objective #1--We identified several open-source tools that can be used for hierarchical Bayesian modeling: jags Stan RStan rjags R2jags runjags mcmc OpenBugs We identified the following packages for Bayesian Network Analysis (including constructing Bayesian Networks, analyzing the networks and visualizing results): bnlearn gRain deal snow/parallel igraph sna gephi Objective #2--After installing and testing all of the tools for hierarchical Bayesian modeling, we found that Stan was the most-suited to our needs, as it is written in C++ (and therefore is incredibly fast for large datasets), and it can accomodate mixed models. However, jags is more widely used at this point (as Stan is newer), and therefore we implemented models in both jags and Stan. rjags is the most comprehensive and user-friendly R package for interfacing between jags and R, while RStan is the only currently-available package to interface with Stan. For Bayesian Network Analysis, we found bnlearn to be the most flexible (and the only such software that can accomodate both categorical and continuous data). In order to speed computation, we also implemented parallelization using the snow package. Finally, we were able to transfer the network characteristics into igraph and sna to perform network analysis, and then from igraph into Gephi for visualization. Objective #3-- Using the tools identified and valdiated above, we were able to develop a new hierarchical Bayesian model using Dirichlet priors for proportional datato model shotgun metagenomic resistome counts that had previously been modeled using a non-Bayesian approach. We found that the Bayesian approach was more conservative and likely more accurate, as we were able to incorporate hyperpriors to account for increased variance in shogun metagenomic data. This model will help researchers to focus on associations with management factors andantimicrobial resistance that are more likely to be "true positives" (as opposed to false positives, of which there are many using non-Bayesian methods). This cangreatly increase the efficiency of shotgun metagenomic resarch, particularly as a hypothesis generation tool. Using a Bayesian Network Analysis approach, we also uncovered novel associations in a dataset of microbiome, demographic, diagnostic and behavioral factors for human bacterial vaginosis. This dataset was used to validate the Bayesian Network approach because it is a well-studied, highly-validated dataset with many disparate data types. We found that using the Bayesian Network approach, we can both re-affirm already-known associations between microbiome, host and environment, while also uncovering novel associations. This approach translates seamlessly to other datasets, and we are currently generating datasets with much more robust metadata in order to utilize this method. Finally, we designed and validated a pre-sequencing enrichment system for antimicrobial resistance and virulence genes in shotgun metagenomic data. This system includes >23,000 baits that were custom-designed to capture >5,500 resistance and virulence genes; as well as use of randomly-generated 22-mer unique molecular indicators (UMIs) within sequencing adapters. Using the baits and UMIs, we are bow able to count individual resistance genes within shotgun metagenomic data (including PCR duplicates), while simultaneously increasing on-target sequencing of resistance/virulence genes from an average of <0.1% up to nearly 50% of sequence data. This system will be made publicly available upon publication, and will allow shotgun metagenomic resistance researchers to greatly decrease sequencing costs, while also improving sensitivity of the approach. This has already allowed us to uncover novel associations between environment and management practices that influence the presence of rare resistance elements -- associations which we were not able to identify without this pre-sequencing enrichment system. We also discovered that the resistome (i.e., all of the resistance elements in the microbiome) is much more diverse than previously known.

Publications

  • Type: Journal Articles Status: Accepted Year Published: 2016 Citation: 1. Lakin SM*, Dean C*, Noyes NR*, Dettenwanger A, Ross A, Doster E, Rovira P, Abdo Z, Jones KL, Ruiz J, Belk KE, Morley PS, Boucher CA. MEGARes: an antimicrobial resistance database for high throughput sequencing. Nucleic Acids Research 2016; 45 (D1): D574-D580. *Co-first authors these authors contributed equally to this work.
  • Type: Journal Articles Status: Under Review Year Published: 2016 Citation: Noelle R. Noyes, Maggie E. Weinroth, Jennifer Parker, Chris A. Dean, Steven E. Lakin, Robert A. Raymond, Pablo Rovira Sanz, Enrique Doster, Zaid Abdo, Jennifer Marti2, Kenneth L. Jones, Jaime Ruiz, Christina A. Boucher, Keith E. Belk, Paul S. Morley. MEGaRICH: A Pre-Sequencing Capture System for Enriching and Counting Resistance Genes within Metagenomic Samples.
  • Type: Journal Articles Status: Awaiting Publication Year Published: 2016 Citation: Muggli M, Bowe A, Gagie T, Raymond R, Noyes NR, Morley PS, Belk KE, Puglisi S, Boucher CA. Succinct Colored de Bruijn Graphs.
  • Type: Journal Articles Status: Under Review Year Published: 2016 Citation: Joseph R. Owen, Noelle Noyes, Daniel J. Prince, Amy E. Young, Beate M. Crossley, Patricia C. Blanchard, Terry W. Lehenbauer, Sharif S. Aly, Jessica H. Davis, William J. Love, Sean M. ORourke, Zaid Abdo, Keith Belk, Michael R. Miller, Paul Morley, Alison L. Van Eenennaam. Whole-Genome Sequencing of Bacterial Isolates Associated With Bovine Respiratory Disease.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: 1. Noyes NR, Weinroth M, Doster E, Rovira Sanz P, Yang X, Dean C, Boucher CA, Jones KL, Abdo Z, Morley PS, Belk KE. The beef fecal resistome differs from other commodities. Beef Industry Safety Summit, Austin, TX, March 2016.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: 1. Noyes NR, Abdo Z, Boucher CA, Belk KE, Morley PS. Becoming Bayesian: research and other during my USDA NIFA postdoctoral fellowship. USDA Fellowship Meeting, Washington DC, August 2016.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Muggli M, Bowe A, Gagie T, Raymond R, Noyes NR, Morley PS, Belk KE, Puglisi S, Boucher CA. Succinct Colored de Bruijn Graphs. ISMB, Orlando, FL, July 2016.