Progress 06/15/20 to 06/14/22
Outputs Target Audience:The target audience of this work is the agencies and individuals involved in the US National Bovine Tuberculosis Eradication Program including collaborators at the National Veterinary Services Laboratories and other academic instititutions around the world interested in Mycobacteriumbovis and Mycobacterium tuberculosis complex evolutionary biology. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Kristina attended two virtual workshops as part of the University Washington Summer Institute for Statistical Genomics series. Workshop 1: Population Genetics Workshop 2: MCMC for Genetics Kristina also participated in the following weekly seminars: Epi seminar fall 2020: SARS-CoV-2 epidemiology and public health interventions. Epi seminar Spring 2021: Decision Making: Is our research helping, hindering, or making any difference? BTRY 6890: Population Genetics journal club Kristina also took Human Genomics (BioMG 6871) in the fall 2021 semester for general interest and to learn more about vertebrate genomics. She also attended the 2020and 2021 Conferences for Research Workers in Animal Disease. Shealso attended several virtual conferences including the Biodiversity Genomics conference and Planetary Health meeting. Lastly,Kristinasuccessfuly defended her PhD thesis in May 2022. How have the results been disseminated to communities of interest?Kristina presented a virtual oral presentation at the Conference for Research Workers in Animal Disease (CRWAD), 2020 Chicago IL, titled: Exploring mechanisms of accessory genome evolution the clonally evolving Mycobacterium tuberculosis complex. She also presented a talk on the M. bovis pangenome workat the Conference for Research Workers in Animal Disease in Chicago IL in December, 2021 and was awarded the American College of Veterinary Microbiologists Don Kahn Award for Best Overall Presentation. This work was published inMicrobial Genomics in 2022. Kristina also participated in quarterly zoom meetings and regular email communication with US National Bovine TB Eradication Program leaders including Dr. Suelee Robbe-Austerman, Dr. Kathy Orloski, Dr. Tyler Thacker and Dr. Claudia Perea. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Goal 1. Estimating time since herd infection and investigating drivers of variation in M. bovis evolutionary rate. Throught Kristina's training, she learned that uccess of goal 1 heavilyreliedupon understanding how M. bovis evolves within hosts and between hosts, and how that evolution, quantified by aquired mutations, is related to time. Because a deep understanding of within host evolution is necessary to develop predictive models, Kristina developed additional models beyond what was originally described in the project proposal and conducted careful sequence analyses to better understandM. bovisgenome dynamics within hosts. Year 1 Accomplishments: Development of a simulation framework to study within host evolution. Our first task was to study how different evolutionary forces including population bottlenecks and clonal progeny skew impact both the trajectory of mutations in a population in a parameter estimation framework that does not rely on narrow assumptions such that of equal variance offspring distributions in the Kingman's coalescent. Kristina developed a forward genetic simulation structure to study how demographic forces and infetion dynamics influence the evolution of M. bovis. This framework was further developed in year 2. Year 2 Accomplishments:Testing evolutionary hypotheses using simulation We studied the within host evolution of M. bovis using the empirical distribution mutations that arise de novo within an animal host and forward genetic simulation to test hypotheses for evolutionary mechanisms within the host. We found that the empirical distribution of mutations does not deviate from a null expectation of random mutation, suggesting drift, not selection, is the primary evolutionary force driving mutation during infection in animal hosts. We also found that a within host evolutionary scenario involving skewed offspring distributions and panmictic within host populations and a rapid mutation rate was best supported by the data. The diversity generating effect of faster mutation rate is counteracted by the diversity reducing effect of large variance in offspring distributions, where a minority of individuals produces offspring in the next generation. Our simulation framework can be applied to other aspects of M. bovis molecular epidemiology, including transmission analysis, and studying the evolution of important phenotypes like antimicrobial resistance. During this time Kristina alsodeveloped a machine learning predictionframework to estimate characteristics of an outbreak using whole genome sequencing data. We are currently in the process of testing convolutional neural networks (CNN) and other machine learning methods to predict the relationship between inputs (outbreak SNP alignments and indel presence/absence data) and time. CNNs are a class of deep learning models commonly used in image processing and are designed to recognize patterns in data without a needing to compress input data into a predefined feature vector. A CNN framework was selected for this problem because of the ability of CNNs to accurately identify evolutionary processes from labeled population genetic data, with equal or greater accuracy than other methods. Additionally, CNNs take matrices as inputs, and sequence alignments can be easily formatted as matrices without loss of genome position information.The fitted model will allow us to estimate the relationship between mutations and real time to provide a method for determining the time since a herd was infected. The model will also be capable of estimating the population bottleneck size, analogous to the transmission dose, which is currently unknown. Furthermore, this modeling framework provides an alternative to coalescent-based methods for studying outbreak dynamics in clonally evolving, intracellular pathogens. Goal 2. Predict the geographical source of outbreaks by analyzing pan-genomic population structure. Recent work published in 2021 suggested that M. bovis has an open pangenome, implying that new genes would be discovered with each new genome sequenced. Since M. bovis is thought to evolve strictly clonally, with no known mechanism for horizontal gene transfer, we were skeptical of this result. In the absence of horizonatal gene transfer, new genes should not be acquired by mechansims other than duplication and mutation, so M. bovis should in theory have a limited capacity for new gene evolution compared to other prokaryotes. Year 1 Accomplishments: Whole genome de novo assembly and pangenome characterization Before addressing our goal of utilizing gene content variation in outbreak analysis, we needed to confirm whether the M. bovis pangenome is truly open, or if sequencing, annotation, or assembly errors artificially inflated the accessory gene count and pangenome size in previous studies. First, Kristinade novo assembled a sample of 1463 globally distributed M. bovis genomes, representative of all known M. bovis lineages. She then constructed the pangenome, and developed a series of quality control analyses to ensure that each accessory gene identified was truely a variable gene, and not an annotation error. Year 2 Accomplishments: Using this large sample of M. bovis genomes, and throurough bioinformatic analyses involving additional non-standard quality control procedures, we confirmed that theM. bovispangenome is infact compact consistent with a closed pangenemoe and ongoing clonal evolution. Kristina also found that indel variation is commen among outbreak sequences, even with similar or identical SNP patterns, so we concluded that altough gene content variation is very limited inM. bovis, indel variation could be a useful source of information in outbreak analysis. This project was published in Microbial Genomics in 2022.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2022
Citation:
Ceres, Kristina M., Stanhope, Michael J., Grohn, Yrjo T. "A critical evaluation of Mycobacterium bovis pangenomics, with
reference to its utility in outbreak investigation." Microbial Genomics (2022).
|
Progress 06/15/21 to 06/14/22
Outputs Target Audience:The target audiences reached in the June 2021-June 2022 funding period include 1) USDA National Veterinary Services Laboratories scientists through virtual meetings, 2) the general animal health community through presentation at the Conference for Research Workers in Animal Disease in Chicago, IL, December 2021. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?I took Human Genomics (BioMG 6871) in the fall 2021 semester for general interest and to learn more about vertebrate genomics. In addition to the 2021 Conference for Research Workers in Animal Disease, I also attended several virtual conferences including theBiodiversity Genomics conference andPlanetary Health meeting. How have the results been disseminated to communities of interest?I presented a talk on the M. bovis pangenome workat the Conference for Research Workers in Animal Disease in Chicago IL in December, 2021 and was awarded the American College of Veterinary MicrobiologistsDonKahnAward for Best Overall Presentation. This work is also accepted and pending open access publication at Microbial Genomics. What do you plan to do during the next reporting period to accomplish the goals?This is the final reporting period, however I plan to continue work on the convolutional neural network through the end of 2022. Specifically, I will use the training database developed during the funding period to train the convolutional neural network. The training process will include hyperparameter tuning to achieve optimal prediction accuracy. The model with then be tested on United States 2021-2022 M. bovis outbreak data provided by USDA partners at the National Veterinary Services Laboratories.
Impacts What was accomplished under these goals?
Goal 1. Estimating time since herd infection and investigating drivers of variation in M. bovis evolutionary rate. Success of goal 1 relies upon understanding how M. bovis evolves within hosts and between hosts, and how that evolution, quantified by aquired mutations, is related to time. Our first task was tostudy how different evolutionary forces including population bottlenecks and clonal progeny skew impact both the trajectory of mutations in a population in a parameter estimation framework that does not rely on narrow assumptions such that of equal variance offspring distributions in the Kingman's coalescent. Goal 1a. Develop a parameter estimation framework for studying M. bovis within host evolution using forward genetic simulation. We studied the within host evolution of M. bovis using the empirical distribution mutations that arise de novo within an animal host and forward genetic simulation to test hypotheses for evolutionary mechanisms within the host. We found that the empirical distribution of mutations does not deviate from a null expectation of random mutation, suggesting drift, not selection, is the primary evolutionary force driving mutation during infection in animal hosts. We also found that a within host evolutionary scenario involving skewed offspring distributions and panmictic within host populations and a rapid mutation rate was best supported by the data. The diversity generating effect of faster mutation rate is counteracted by the diversity reducing effect of large variance in offspring distributions, where a minority of individuals produces offspring in the next generation. Our simulation framework can be applied to other aspects of M. bovis molecular epidemiology, including transmission analysis, and studying the evolution of important phenotypes like antimicrobial resistance. Goal 1b. Using the modeling framework designed in Goal 1a, develop a prediction model to estimate characteristics of an outbreak using whole genome sequencing data We are developing a convolutional neural network (CNN) to predict the relationship between inputs (outbreak SNP alignments and indel presence/absence data) and time. The CNNs are a class of deep learning models commonly used in image processing and are designed to recognize patterns in data without a needing to compress input data into a predefined feature vector. A CNN framework was selected for this problem because of the ability of CNNs to accurately identify evolutionary processes from labeled population genetic data, with equal or greater accuracy than other methods. Additionally, CNNs take matrices as inputs, and sequence alignments can be easily formatted as matrices without loss of genome position information. The fitted model will allow us to estimate the relationship between mutations and real time to provide a method for determining the time since a herd was infected. The model will also be capable of estimating the population bottleneck size, analogous to the transmission dose, which is currently unknown. Furthermore, this modeling framework provides an alternative to coalescent-based methods for studying outbreak dynamics in clonally evolving, intracellular pathogens. Goal 1a status: Analysis is complete, manuscript draft is complete, publication pending; anticipated completion August 2022 Goal 1b status: SImulation design is complete, training database runs in progress; anticipated completion December 2022 Goal 2. Predict the geographical source of outbreaks by analyzing pan-genomic population structure. Recent work published in 2021 suggested that M. bovis has an open pangenome, implying that new genes would be discovered with each new genome sequenced. Since M. bovis is thought to evolve strictly clonally, with no known mechanism for horizontal gene transfer, we were skeptical of this result. In the absence of horizonatal gene transfer, new genes should not be acquired by mechansims other than duplication and mutation, so M. bovis should in theory have a limited capacity for new gene evolutioncompared to other prokaryotes. Before addressing our goal of utilizing gene content variation in outbreak analysis, we needed to confirm whether the M. bovis pangenome is truly open, or if sequencing, annotation, or assembly errors artificially inflated the accessory gene count and pangenome size in previous studies. First Ide novo assembleda sample of 1463globally distributedM. bovis genomes, representative of all known M. bovis lineages. I then constructed the pangenome, and developeda series of quality control analyses to ensure that each accessory gene identified was truely a variable gene, and not an annotation error. Using this large sample, and throurough bioinformatic analyses involving additional non-standard quality control procedures, we confirmed that the M. bovis pangenome is infact compact consistent with a closed pangenemoe and ongoing clonal evolution. I also found that indel variation is commen among outbreak sequences, even with similar or identical SNP patterns, so we concluded that altough gene content variation is very limited in M. bovis, indel variation could be a useful source of information in outbreak analysis. Goal 2 status: complete, manuscript accepted and awaiting publication at Microbial Genomics
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2022
Citation:
Ceres, Kristina M., Stanhope, Michael J., Grohn, Yrjo T. "A critical evaluation of Mycobacterium bovis pangenomics, with reference to its utility in outbreak investigation." Microbial Genomics (2022)
|
Progress 06/15/20 to 06/14/21
Outputs Target Audience:The target audience was our USDA partners involved in the US National Bovine Tuberculosis Eradication program. Through regular meetings in 2020 and 2021, we refined our project plan to optimize collaboration and Kristina's training goals. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Kristina attended two virtual workshops as part of the University Washington Summer Institute for Statistical Genomics series. Workshop 1: Population Genetics Workshop 2: MCMC for Genetics Kristina also participated in the following weekly seminars Epi seminar fall 2020:SARS-CoV-2epidemiologyand public health interventions. EpiseminarSpring 2021: Decision Making: Is our research helping, hindering, or making any difference? BTRY 6890: Population Genetics journal club How have the results been disseminated to communities of interest? Virtual oral presentation at the Conference for Research Workers in Animal Disease (CRWAD), 2020 Chicago IL,Exploring mechanisms of accessory genome evolution the clonally evolving Mycobacterium tuberculosis complex Quarterly zoom meetings and regular email communication with US National Bovine TB Eradication Program leaders including Dr. Suelee Robbe-Austerman, Dr. Kathy Orloski, Dr. Tyler Thacker and Dr. Claudia Perea. What do you plan to do during the next reporting period to accomplish the goals?Goal 1a: Understand the distinct roles of selection, demography, and progeny skew inM. bovisevolution over the course of an outbreak in a cattle herd. Complete simulations submit manusctipt 3 (Evolutionary pressures governing Mycobacterium boviswithin and between host evolution) Goal 1b: Developa convolutional neural network (CNN) to predict the relationship between inputs (outbreak SNP alignments) and time. complete test data simulations develop CNN structure and tune hyperparameters submit manuscript 4 (Using convolutional neural networks to uncover Mycobacteriumbovis outbreak population dynamics) Goal 2:Predict the geographical source of outbreaks by analyzing pan-genomic population structure. Publish manuscript 1 (Mycobacterium bovis pangenome evolution) Publish manuscript 2 (Genotype to outbreak source prediction using deep learning)
Impacts What was accomplished under these goals?
Goal 1:Estimate time since herd infection and investigate drivers of variation inM. bovisevolutionary rate The focus of project 1 is tolinkM. bovisgenetic diversity generated over the course of an outbreak to time, which depends on accurate estimation of mutation rate. Most methods for combining sampling time data with sequence data rely on a set of assumptions for the underlying mutation generating process including the assumption that each individual in a population has an equal chance creating offspring. However, recent work has shown thatM. tuberculosishas skewed offspring distributions caused by clonal replication and serial population bottlenecks created by transmission. In light of this recent work, we have modified Goal 1 to incorporate 2 parts. Goal 1a is firstto understand the distinct roles of selection, demography, and progeny skew inM. bovisevolution over the course of an outbreak in a cattle herd. Our hypothesis is that the dominant evolutionary mechanism that produces skewed site frequency spectra toward rare variants is demography characterized transmission bottlenecks and subsequent compartmentalization ofM. boviswithin the host lead to highly structured populations. Results from this project will be used to build simulation models toaccurately depictM. boviswithin host evolution. Goal 1b is to developa convolutional neural network (CNN) to predict the relationship between inputs (outbreak SNP alignments) and time. The CNNs are a class of deep learning models commonly used in image processing and are designed to recognize patterns in data without a needing to compress input data into a predefined feature vector. A CNN framework was selected for this problem because of the ability of CNNs to accurately identify evolutionary processes from labeled population genetic data, with equal or greater accuracy than other methods. Additionally, CNNs take matrices as inputs, and sequence alignments can be easily formatted as matrices without loss of genome position information. The fitted model will allow us to estimate the relationship between mutations and real time to provide a method for determining the time since a herd was infected. The model will also be capable of estimating the population bottleneck size, analogous to the transmission dose, which is currently unknown. Furthermore, this modeling framework provides an alternative to coalescent-based methods for studying outbreak dynamics in clonally evolving, intracellular pathogens. Goal 1status:in progress Empirical data: Outbreak sequences have been identified, and phylogenetic trees and summary statistics for each outbreak have been created. Simulation data: Key simulation parameter values are being estimated in coordination with USDA partners. After these parameters are estimated, simulations will commence on high powered computers. Test data: simulations have been designed to generate test data. Exact parameter values depend on results from Goal 1a. CNN structure: I have conducted a literature review to determine the best starting places for CNN structure design. During the testing process, model hyperparameters will be tuned to produce optimal results. Goal 2: Predict the geographical source of outbreaks by analyzing pan-genomic population structure. The purpose of project 2 is to first determine how the M. bovispangenome has evolved over space and time, and then to develop a model that can predict geographic location from M. bovis genome sequences. This project involves determining patterns of evolution in the core genome (genes shared by all members of the species) and the accessory genome (genes not present in all members), and to determine if these partitions of the pangenome evolve clonally. M. bovis is thought to be a strictly clonal pathogen, with no known mechanisms for ongoing horizontal gene transfer. Therefore, my overarching hypothesis is that the accessory genome evolves through serial gene deletions, and that these deletions, along with purifying selection and serial bottlenecks created by transmission, causes strong population structure. This strong population structure will lead to genetic signatures that are specific to and diagnostic of geographic location. I have found evidence of events that look like gene deletion, consistent with my hypothesis, and, surprisingly, strong evidence for gene addition. Upon closer inspection of the gene addition events, I have found that many have arisen from gene duplication events suggesting that gene duplication and subsequent mutation, instead of horizontal gene transfer, may be an important mechanism for gene content diversity generation in M. bovis.I am currently building supervised machine learning models with SNP data alone, and with SNP data and gene presence/absence data together to explore whether or not gene presence/absence information increases prediction accuracy. Goal 2 status: near complete Evolutionary analysis: complete, manuscript in preparation Geographic analysis: Genotype and location data were acquired froom USDA partners, deep learning model in progress.
Publications
|
|