Source: UNIVERSITY OF GEORGIA submitted to
DEVELOPMENT AND APPLICATION OF BIOINFORMATIC APPROACHES FOR FOODBORNE PATHOGEN DETECTION, SUBTYPING AND GENOMIC EPIDEMIOLOGY INVESTIGATION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
1006141
Grant No.
(N/A)
Project No.
GEO00744
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
May 1, 2015
Project End Date
May 1, 2020
Grant Year
(N/A)
Project Director
Deng, XI.
Recipient Organization
UNIVERSITY OF GEORGIA
200 D.W. BROOKS DR
ATHENS,GA 30602-5016
Performing Department
Center for Food Safety
Non Technical Summary
Drastic advances in high throughput genome sequencing technologies and the rapid decrease of their costs have created opportunities for better detection, characterization and studying of foodborne bacterial pathogens. However, compared with other areas of biomedical science, the application of these technologies is still quite limited in food microbiology. This 5-year project is aimed at promoting sequencing technologies and bioinformatics in addressing key and pressing issues in food safety. Culture-independent pathogen detection and characterization. The detection of bacterial foodborne pathogens often relies on culturing to obtain colonies or pre-enriched samples for morphological, molecular (e.g., polymerase chain reaction or PCR) or biochemical (e.g., enzyme-linked immunosorbent assay or ELISA) identification and confirmation of the presence of the pathogen. Compared to the existing methods, metagenomics has a potential and unique capability for sensitive detection and feature-rich characterization of foodborne pathogens directly from food matrices and environments in a culture-independent manner. A recent exploratory study by FDA provided an important proof of concept for metagenomic detection of Salmonella from fresh produce that eluded both culture and PCR detection methods (1). In theory, metagenomic sequencing can bypass culture enrichment and avoid its drawbacks such as lengthy turnaround, enrichment biases and even questionable efficacy (2). In practice, however, the often low abundance and sporadic presence of pathogen in foods (compared with clinical samples) poses a challenge for metagenomic capturing of sequence features sufficient for robust identification and further subtyping. A culture enrichment step was still applied in the aforementioned study.Pathogen subtyping. Foodborne disease outbreaks are commonly caused by the consumption of food and water contaminated by bacterial, protozoan or viral pathogens. The ancestral relationship and genetic relatedness among different isolates implicated in an outbreak provide vital information for epidemiological and outbreak investigations. The degree to which those isolates genetically vary from each other depends upon how fast organisms alter their genetic structures (point mutations, insertions, deletions, etc.), and how long the progenies of common contaminant(s) have been able to diverge in a food or host environment before an outbreak is detected. One of the most powerful methods to study the source of outbreaks is bacterial subtyping, which is the process of characterizing isolates at various subspecific levels to determine whether these isolates might berelated. Detection of possibly related organisms may suggest a common source of contamination. One key criterion for the usefulness of a subtyping technique is whether the polymorphic markers that the technique targets can differentiate outbreak isolates from epidemiologically distinct but genetically related bacterial isolates.Early subtyping schemes use phenotypic traits as epidemiologic markers. Commonly used methods in this category include biotyping, serotyping, phage typing and multilocus enzyme electrophoresis (3). Some of these phenotype-based methods are still used for preliminary characterization of certain bacterial pathogens in disease outbreak investigations. Due to the inherent drawbacks of phenotypic methods, especially limited discriminatory power and reproducibility, several DNA-based techniques such as ribotyping, pulsed-field gel electrophoresis (PFGE), randomly amplified polymorphic DNA (RAPD), repetitive sequence-based PCR (Rep-PCR), multiple loci variable number tandem repeat (VNTR) analysis (MLVA), and multilocus sequencing typing (MLST) have been adopted for molecular subtyping of various foodborne pathogens. DNA-based subtyping techniques have improved discriminatory power and reproducibility in comparison to phenotypic methods because DNA based methods detect sequence polymorphisms in bacterial genomes which are often more definitive and stable than phenotypic polymorphisms. Furthermore, some molecular methods have also improved the portability of subtyping data (i.e. inter-laboratory comparability of data). While gel electrophoresis banding based methods such as PFGE and RAPD do not generate truly digital data, sequencing- based techniques such as MLST produce sequence data of high portability.It is recognized, however, that most of the current DNA-based subtyping techniques have inherent limitationssuch as the incomplete detection of genomic variation that exists among different bacterial outbreak strains. This results in poor discriminatory power to differentiate among strains that belong to a highly clonal lineage. As the technical and logistical barriers to obtaining large and complete genome sequences are rapidly diminishing thanks to the fast evolving high-throughput whole genome sequencing (WGS) technologies, it is becoming increasingly feasible to upgrade the existing molecular subtyping techniques to a newer generation of whole-genome sequence-based methods.Genomic epidemiology. The advent of cost-efficient, high-quality and widely available WGS is poised to take the surveillance of infectious diseases and its benefits to public health to a new level. First, with routine access to nearly all the genetic variations among genomes, WGS allows closely-related isolates to be differentiated with unparalleled resolution, leading to much improved outbreak detection, source attribution and more precise and focused epidemiological investigation. Second, by making the entirety of genetic information readily available, WGS promises the integration of a myriad of parallel workflows typically employed at public health laboratories (e.g. identification, serotyping, antimicrobial resistance testing etc.) into a single, fast and efficient platform featuring in silico identification and prediction of various geno-and phenotypic features. This will substantially boost the capacity and capability for pathogen identification, subtyping and characterization with the prospect of transforming public health microbiology.Distinguishing individual lineages and, sometimes, epidemic clones (4) within a species allows the identification and tracking of the etiologic agents of bacterial infectious diseases (5). Clonal populations commonly encountered in recent and emerging epidemics often baffle genetic and epidemiologic investigations due to their low level of genetic diversity (6). This issue is being tackled by recent application of WGS, which enables fine-grained cataloging of genome-wide evolutionary events and consequent construction of robust phylogenies. Furthermore, fueled by WGS and advanced analytical tools (e.g. Bayesian phylogenetics (7)), researchers have started to integrate spatial-temporal modeling of pathogen movement and migration into population genetics studies, uncovering unprecedented rich details about the emergence, transmission and adaptation of pathogens at various scales, ranging from a hospital, a community to an entire continent (8-10).Research gaps and unanswered questions. Despite aforementioned advances in technologies and research, knowledge and technical gaps still remain. Notably, the utilization of WGS data has been conspicuously outpaced by the generation of such data. A large and exponentially growing volume of pathogen genome sequences remain largely untapped. Meanwhile, our understanding of population structure and evolutionary dynamics of major foodborne pathogens needs to be updated and expanded given the access to genome wide sequence features impenetrable by traditional subtyping tools. Also the practicality of whole genome sequencing technologies in food safety and public health applications relies on how efficient sequencing data of target pathogen can be generated from food samples.
Animal Health Component
0%
Research Effort Categories
Basic
20%
Applied
40%
Developmental
40%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
71240101100100%
Goals / Objectives
The proposed research is aimed to explore the application of high throughput sequencing in various aspects of public health microbiology of major foodborne pathogens, especially Salmonella enterica. Specific objectives are:Develop a culture-independent and metagenomics-inspired solution to detecting and characterizing Salmonella up to serotype level and beyond from food samples.Develop bioinformatics tools for high throughput sequencing based pathogen subtyping and characterization.Build robust genealogies and dissect population structures for major Salmonella serotypes.Probe the emergence, spread and establishment of selected lineages of major Salmonella serotypes and explore how short-term evolution may signal or affect such events.
Project Methods
Objective 1.High-efficiency immunomagnetic separation (IMS). We will implement and validate a newly-developed centrifugal microfluidic system by Dr. Peter Hesketh at Georgia Institute of Technology that allows highly effective capture of bacterial cells via efficient mixing of magnetic beads through food sample. We will use 1) freshly harvested cantaloupes from a local farm without any processing to represent farm and harvest phases of application with high levels of naturally-occurring microflora; and 2) pre-packaged, pre-washed Romaine lettuce from a retailing source to represent retail and consumer phases with low levels of microflora. Both items have been implicated in recent Salmonella outbreaks and are subjects of our previous research. Salmonella cells will be inoculated on the surface of the fresh produce. We will apply the CD microfluidics to capture and separate of Salmonella from produce rinse liquids.The performance of our system will also be compared to that of the traditional IMS.Multiple displacement amplification (MDA). Following IMS, we will use MDA to amplify the often trace amounts of target pathogen DNA to facilitate subsequent genome sequencing. MDA has been shown to support whole genome sequencing (WGS) from a single E. coli cell (11) and the combined use of IMS and MDA has led to WGS of difficult-to-culture pathogen from clinical samples (12). Commercially available MDA kits will be evaluated along with the modified IMS system to investigate how efficient their combined use will generate Salmonella DNA for subsequent sequencing.High throughput sequencing and bioinformatics analysis. Following IMS and MDA, the DNA sample will be sequenced on an Illumina MiSeq instrument. The raw sequencing reads will be analyzed for determination of serotypes and other subtypes (see Objective 2).Validation. The validation of the entire workflow will be performed with 25 g aliquots of fresh produce (cantaloupe rind and lettuce leaves) with 3 spike levels of inoculum (i.e., fractional, 1 log higher and blank).Objective 2High throughput sequencing based determination of Salmonella serotype.We will develop a bioinformatics solution (termed "SeqSero") for high throughput sequencing based serotyping. The pipeline will be designed to allow the input of both genome assemblies and sequencing reads through a web interface. These data will be processed by a pipeline implemented on a cloud serverto determine the O and H antigens by comparisons against individual curated databases of alleles encoding genes responsible for serotype. Then the serotype will be called according to the Kauffmann-White scheme. Databases for serotype determinants will be periodically updated to include novel sequences.We will build individual databases for the three genetic determinants of Salmonella serotype, including two flagellin structural genes fliC and fljB (encoding H antigens) and the rfb gene cluster (encoding genes responsible for O antigen. We will try to include all the currently available sequences from our previous studies (13, 14), literature (15) and Genbank. All the databases will be periodically updated for new sequences.Target sequences will be extracted by locating conservative sequences bordering the rfb region or in silico PCR to amplify fliC and fljB using primers flanking variable regions within the genes. Target sequences will be compared to serotype determinants databases through BLAST (16).Input reads will be directly mapped to the sequences in each database using BWA (17) based on sequence similarity.Closely-related fliC and fljB indistinguishable after reads mapping will be subject to finer-scale differentiation targeting signature SNPs or indels.High throughput sequencing based identification and subtyping of Shiga toxin-producing E. coli (STEC). Using the similar approach, we will develop another pipeline that allows: 1) O and H antigen determination for major STEC serotypes; 2) Detection and subtype identification of major virulence factors including stx1, stx2, eae, espP and O island 122 (OI-122); and 3) seven-gene multi locus sequence typing (MLST) of E. coli with the full allele databases from http://www.mlst.net/. Objective 3Isolates. For WGS, we will select 100-150 isolates from major serotypes - Newport, Heidelberg, Infantis, Javiana, Saintpaul, Montevideo, Oranienburg, Thompson and Muenchen - to represent aforementioned diversities according to our current knowledge. Genome-wide detection of single nucleotide polymorphisms (SNPs). Streamlined SNP detection will be performed using our published method (18) by mapping sequencing reads to fully assembled reference genomes. Phylogenetic analyses. As we previously described (18), recombination events and highly homoplastic sites indicative of non-neutral evolution, horizontal gene transfer, or ambiguous SNP calls will be detected and excluded from phylogenetic reconstruction. Maximum-likelihood (ML) trees based on remaining core genome SNPs will be built and used to test for a temporal signal based on isolation year of each strain.Object 4Age dating of individual lineages. Bayesian phylogenetic analyses will be performed by using the latest version of BEAST (7) to establish a temporal framework for constructing phylogenetic relationship among the isolates and estimating parameters to describe the evolutionary dynamics of the populations as we previously described (18). Geographical distribution and transmission. Lineages that display clustering of isolates from a particular geographical location will be identified. Serotypes featuring geographically structured populations will be subject to phylogeographical analysis implemented in BEAST (7). In a genealogy (phylogenetic tree), every branch will be assigned a geographical source. Together with the estimated age of each internal node, this will resolve major geographical transmissions by inferring the time of their occurrence and direction of movement.Lineages associated with specific ecological niches or environments (e.g. animal hosts) will be identified. Clustered regularly interspaced short palindromic repeats (CRISPRs) will be extracted from these lineages to study their potential utility as environmental markers. CRISPRs that are commonly found in bacteria and archaea originate from phages and plasmids that may bear ecological signature of a particular habitat. Our recent study (manuscript under review) show than CRISPRs alone could resolve the major lineages of SE including the ones with different ecological backgrounds.Population dynamics. Temporal changes in effective population sizes and fluctuations in the numbers of lineages over the time will be modeled to study the general population dynamics of major serotypes in recent history.In-depth evolutionary and comparative genomics studies. Based on aforementioned analyses, we will select 3-5 serotypes whose sampled populations show interesting features such as emerging lineages or sublineages, dynamic geographical dispersion, potential niche adaptation, and rapid expansion or diversification. We will assemble and annotate their genomes; compare gene contents between lineages; and extend the evolutionary analyses to pan-genomes by including accessory genes (in contrast with core genes shared by every member of a population) that afford selective advantages (e.g. antimicrobial resistance and virulence factors) and most of the genetic diversity within recently emerged pathogens (19).

Progress 05/01/15 to 05/01/20

Outputs
Target Audience:Food safety and public health professionals Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?A graduate student finsihed his PhD for his work as part of the project. He contibued his work as a posdoctoral associate in my laboratory. How have the results been disseminated to communities of interest?The PI gaveinvited seminars atInternational Association of Food Protection Annual Meeting, Pennsylvania State University,Illinois Institute of Technology, Institute of Microbiology-Chinese Acamedy of Science, BioMerieux Food Safety Symposium, andMars Global Food Safety Center. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We continued to improve SeqSero2, a bioinformatics tool we developed for rapid and accurate Salmonella serotype prediction. SeqSero2 is being used nationally and globally as a go-to tool for Salmonella serotyping. It has been validated and routinely used by CDC, FDA, and USDA. It has been incorporated into NCBI Pathogen Detection (https://www.ncbi.nlm.nih.gov/pathogens/) and EnteroBase (https://enterobase.warwick.ac.uk/).

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Xu F, Ge C, Luo H, Li S, Wiedmann M, Deng X, Zhang G, Stevenson A, Baker RC, Tang S. 2020 Evaluation of real-time nanopore sequencing for Salmonella serotype prediction. Food Microbiology DOI:10.1016/j.fm.2020.103452
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: 14. Forghani F, Li S, Zhang S, Mann DA, Deng X, den Bakker HC, Diez-Gonzalez F. 2020. Detection and serotyping of Salmonella and Escherichia coli in wheat flour by a quasimetagenomic approach assisted by magnetic capture, multiple displacement amplification and real-time sequencing. Applied and Environmental Microbiology DOI:10.1128/AEM.00097-20
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Townsend A, Li S, Mann DA, Deng X. 2020. A quasimetagenomics method for concerted detection and subtyping of Salmonella enterica and E. coli O157:H7 from romaine lettuce. Food Microbiology DOI:10.1016/j.fm.2020.103575
  • Type: Journal Articles Status: Accepted Year Published: 2020 Citation: Li S, Zhang S, Deng X. 2020. GC content-associated bias caused by library preparation method may infrequently affect Salmonella serotype prediction using SeqSero2. Applied and Environmental Microbiology DOI:10.1128/AEM.00614-20
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Li S, Mann DA, Zhang S, Yan Q, Meinersmann J, Deng X. 2020. Microbiome-informed food safety and quality: longitudinal consistency and cross-sectional distinctiveness of retail chicken breast microbiomes. mSystems DOI:10.1128/mSystems.00589-20
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Deng X, Chao S, Horn A. 2021. Emerging applications of machine learning in food safety. Annual Review of Food Science and Technology 12


Progress 10/01/18 to 09/30/19

Outputs
Target Audience:Food safety and public health professioals Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Our paper on Salmonella source attribution has been reported by over 15 news outlets including The Verge, EurekAlert by American Association for the Advancement of Science, and IFT's Food Safety Magazine. It was featured by a CDC podcast and Nature Microbiology Reviews - News & Analysis. It was described by the Deputy Commissioner of the Food and Drug Administration (FDA) as "a new era of smarter food safety & epidemiology". Dr. Deng has been invited to speak about this work at FDA, China International Food Safety & Quality Conference, and bioMerieux Annual Food Safety Symposium in Canada, Institue of Microbiology at Chinese Academy of Sciences, and Mars Global Reseach Center. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We expanded the IMS-MDA approach for Salmonella detection to poultry environmental samples. We launched SeqSero2, aan algorithmic transformation and functional update of the original SeqSero that had been used worldwide for Salmonella serotype prediciton from WGS data. We developed a machine learning approach for zoonotic source attribution of Salmonella using WGS data. We invetigated the implications of mobile genetic elements for SNP typing of Salmonella.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Hyeon J, Mann DA, Townsend AM, Deng X. 2018 Quasi-metagenomics analysis of Salmonella from food and environmental samples. Journal of Visualized Experiments. (140), e58612, doi:10.3791/58612
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: 7. Hyeon J, Mann DA, Wang J, Kim W, Deng X. 2019. Rapid detection of Salmonella in poultry environmental samples using real-time PCR coupled with immunomagnetic separation and whole genome amplification. Poultry Science DOI:10.3382/ps/pez425
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: 8. Zhang S, Li S, Gu W, den Bakker H, Boxrud D, Taylor A, Roe C, Driebe E, Engelthaler DM, Allard M, Brown E, McDermott P, Zhao S, Bruce BB, Trees Eija, Fields PI, Deng X. 2019. Zoonotic Source Attribution of Salmonella enterica Serotype Typhimurium Using Genomic Surveillance Data, United States. Emerging Infectious Diseases 25(1): 82-91
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Zhang S, den Bakker H, Li S, Chen J, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. 2019. SeqSero2: Rapid and improved Salmonella serotype determination using whole genome sequencing data. 2019. Applied and Environmental Microbiology 85:e01746-19. DOI:1128/AEM.01746-19.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: 4. Li S, Zhang S, Baert L, Jagadeesan B, Ngom-Bru C, Griswold T, Katz LS, Carleton HA, Deng X. 2019. Implications of mobile genetic elements for Salmonella enterica single nucleotide polymorphism subtyping and source tracking investigations. Applied and Environmental Microbiology 85:e01985-19. DOI: 10.1128/AEM.01985-19
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Katz L, Griswold T, Morrison SS, Caravas JA, Zhang S, den Bakker HC, Deng X, Carleton HA. 2019. Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software 4(44), 1762 DOI: 10.21105/joss.01762


Progress 10/01/17 to 09/30/18

Outputs
Target Audience:Food safety and public health professionals Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?We plan to extend the quasi-metagenomics method to other foodborne pathogens and food matrices.

Impacts
What was accomplished under these goals? We developed a workflow that combined Salmonella detection and subtyping from food samples in a single workflow. Termed as quasi-metagenomics, the workflow includes a short culture enrichment, immunomagnetic separation (IMS), whole genome amplification by multiple displacement amplification (MDA), and genome sequencing. Coupled with a real-time, portable sequencer, we were able to detect and subtype Salmonella on lettuce to the strain level within 24 h. We also applied this method to retail chicken and black peppercorns.

Publications

  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Hyeon J, Li S, Mann DA, Zhang S, Li Z, Chen Y, Deng X. 2018. Quasimetagenomics-based and real-time-sequencing-aided detection and subtyping of Salmonella enterica from food samples. Applied and Environmental Microbiology 84 e02340-17


Progress 10/01/16 to 09/30/17

Outputs
Target Audience:Food safety and public health professionals. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?We plan to combine genome sequenicng with our smaple prep workflow to achieve Salmonella detection and subtyping in the same workflow. We also plan to investigate the possiblity of Salmonella source attribution using whloe genome sequenicng data.

Impacts
What was accomplished under these goals? We finalizedand published a workflow for rapid and effective Salmonella DNA concentration from food samples that allowed improved Salmonella testing by realtime PCR. We performed and published a comparaive study of major SNP piplelines for high-resolution foodborne pathogen subtyping. We assited in a study that surveyed Listeria monocytogenes fromdiverse food samples in Shanghai, China.

Publications

  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Hyeon J and Deng X. 2017. Rapid detection of Salmonella in raw chicken breast using real-time PCR combined with immunomagnetic separation and whole genome amplification. Food Microbiology 63:111-116
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: 4. Katz LS, Griswold T, Williams-Newkirk AJ, Wagner D, Petkau A, Sieffert C, Domselaar GV, Deng X, Carleton HA. 2017. A comparative analysis of the Lyve-SET phylogenomics pipeline for genomic epidemiology for foodborne pathogens. Frontiers in Microbiology doi:10.3389/fmicb.2017.00375
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Wang W, Zhou X, Suo Y, Deng X, Cheng M, Shi C, Shi X. 2017. Prevalence, serotype diversity, biofilm-forming ability and eradication of Listeria monocytogenes isolated from diverse foods in Shanghai, China


Progress 10/01/15 to 09/30/16

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?The IMS-MDA approach will be used in combination with high throughput sequeicng for "quasi-metagenomic" detection and characterization of foodborne pahtogens from food and environment samples. A software tool will be developed for WGS-based source predction of Salmonella major serotypes.

Impacts
What was accomplished under these goals? We developed a method tocombine immunomagnetic separation (IMS), whole genome amplification by multiple displacement amplification (MDA) and real-time PCR for detecting a bacterial pathogen in a food sample. This method was effective in enabling real-time PCR detection of low levels of Salmonella enterica Serotype Enteritidis (SE) (~10 CFU/g) in raw chicken breast without culture enrichment. In addition, it was able to detect refrigeration-stressed SE cells at lower concentrations (~0.1 CFU/g) in raw chicken breast after a 4-h culture enrichment, shortening the detection process from days to hours and displaying no statistical difference in detection rate in comparison with a culture-based detection method. By substantially improving performance in SE detection over conventional real-time PCR, we demonstrated the potential of IMS-MDA real-time PCR as a rapid, sensitive and affordable method for detecting Salmonella in food. From a large-scale sampling of 1,268 Salmonella Typhimurium genomes from various sources and locations, phylogenetic clustering of isolates attributable to the same source was observed in multiple cases, including population groups overrepresented by isolates from poultry, bovine, and porcine samples.Ecological adaptation as well as industry practice or structure may explain the observed association between a specific population group (or clade) and a particular domestic (food) or wild animal source.Representative isolates from an avian (wild birds) clade and a porcine clade displayed distinct metabolic profiles from representative isolates from other population groups, featuring systemic incapability or inferior ability in utilizing multiple nitrogen substrates. The narrower range of utilizable substrates might be indicative of adaption to a specific host or environment.Isolates from clades associated with industrial food animals (poultry, bovine and porcine) displayed higher abundance of multiple acquired antibiotic resistance genes. Two of such clades carried temporal signals of evolution, which revealed their recent origins in 1990s. Both pieces of evidence are suggestive of the impact of industry practice on the emergence and adaptation of such clades.The identification of recognizable patterns of source distribution and geno-and phenotypically distinct ST clades laid the foundation for developing WGS-based source attribution methods for this important foodborne pathogen.

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Feasey AN, Hadfield J, Keddy KH, Dallman TJ, Jacobs J, Deng X et al. 2016. Distinct Salmonella Enteritidis lineages associated with enterocolitis in high-income settings and invasive disease in low-income settings. Nature Genetics; doi:10.1038/ng.3644
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Deng X, den Bakker HC, Hendriksen RS. 2016. Genomic epidemiology: Whole-genome Sequencing-powered surveillance and outbreak investigation of foodborne bacterial pathogens. Annual Review of Food Science and Technology; Vol. 7: 353-374
  • Type: Books Status: Awaiting Publication Year Published: 2016 Citation: Deng X, den Bakker HC, Hendriksen RS (Eds.) Applied Genomics of Foodborne Pathogens. Springer, New York, USA
  • Type: Book Chapters Status: Awaiting Publication Year Published: 2016 Citation: den Bakker HC, Strawn LK, Deng X 2016. Bioinformatics aspects of foodborne pathogen research. In Applied Genomics of Foodborne Pathogens, edited by Deng X, den Bakker HC, Hendriksen RS. Springer. New York, USA
  • Type: Journal Articles Status: Under Review Year Published: 2017 Citation: Hyeon J and Deng X. 2016. Rapid detection of Salmonella in raw chicken breast using real-time PCR combined with immunomagnetic separation and whole genome amplification. Food Microbiology


Progress 05/01/15 to 09/30/15

Outputs
Target Audience:Public health and food safety researchers from academia, government andindustry. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The graduate student who has been working on this project presented his work at a workshopin Enteric Diseases Laboratory Branch at CDC and a poster atthe 2015 IAFP Annual Meeting,where he was a finalist for the Developing Scientist Award. How have the results been disseminated to communities of interest?The results were published in Journal of Clinical Microbiology. A posterwas presented at the 2015 IAFP Annual Meeting. An oral presentation was given at the1st ASM Conference on Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens. Researchers fromCDC, FDA and USDA were invited for beta testing of this tool. Volunteer testers from at least 16 countries have submitted more than 2,700 genomes between 2/2015 and 9/2015. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Salmonella is one of the most prevalent foodborne pathogen in the United States. As the basis for Salmonella surveillance, serotyping has been practiced for decades, with a current total of ~35,000 isolates being serotyped by state public health departments every year. Traditional serotyping is time consuming (~3 days) and logistically challenging (full set of Salmonella serotyping requires hundreds of antisera). We developed a prototype of a push-button bioinformatics tool, called SeqSero, that can predict Salmonella serotypes from whole genome sequencing (WGS) data in a matter of seconds or minutes. Since we launched the tool for public access in February, 2015, more than 2,700 queries/genomes have been submitted from users in at least 16 countries. Federal agencies including CDC, FDA and USDA FSIS have been routinely using this tool for Salmonella serotyping, which is also poised to become the next generation serotyping method for national Salmonella surveillance in Denmark. As public health microbiology is being transformed by WGS, SeqSero allows any laboratory with access to WGS to perform near-full spectrum Salmonella serotyping, a capability previously only available to few laboratories. Efforts of the first reporting period (5/2015-9/2015) have been focused on the second objective of this project, which is to develop bioinformatics tools for high throughput sequencing based pathogen subtyping and characterization. A software tool along with databases for Salmonella serotype determinants were developed. A web-based user interface (www.denglab.org/SeqSero) was created for free public access. Preliminary validation of more than 4,000 genomes suggested that the tool was highly accurate (~99% for quality raw reads from isolates of confirmed serotypes), fast (instant for genome assemblies and few minutes for raw reads) and comprehensive (a theoretical total of 2,389 out of 2,577 known serotypes can be determined).

Publications

  • Type: Journal Articles Status: Published Year Published: 2015 Citation: J Clin Microbiol. 2015 May;53(5):1685-92. doi: 10.1128/JCM.00323-15. Epub 2015 Mar 11. Salmonella serotype determination utilizing high-throughput genome sequencing data.