Source: UNIV OF MARYLAND submitted to
DSFAS: MASH - MACHINE LEARNING AND ADVANCED OMICS DATA ANALYSIS FOR IMPROVED FOOD SAFETY AND PUBLIC HEALTH
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
NEW
Funding Source
Reporting Frequency
Annual
Accession No.
1032347
Grant No.
2024-67021-42527
Project No.
MD-NFSC-11654
Proposal No.
2023-11654
Multistate No.
(N/A)
Program Code
A1541
Project Start Date
Aug 1, 2024
Project End Date
Jul 31, 2027
Grant Year
2024
Project Director
Pradhan, A.
Recipient Organization
UNIV OF MARYLAND
(N/A)
COLLEGE PARK,MD 20742
Performing Department
(N/A)
Non Technical Summary
Approximately 1 in 6 Americans are affected by food contaminated with dangerous microorganisms every year. Recurring, emerging, and persistent (REP) microorganisms such as Escherichia coli, Salmonella enterica and Listeria monocytogenes can re-emerge periodically in food systems, causing repeated acute outbreaks, or persist and cause illnesses over long periods of time. Thus, there is an urgent need to develop strategies to predict the spread and subsequently control these microbial contaminants in our food supply. Traditionally, microbial behavior is modeled using simple mathematical and statistical models. However, in recent years, genomic and other 'omics'-based methods are increasingly being used to monitor, identify, and characterize pathogenic microorganisms, introducing novel dimensions to microbial data. It is critical to develop analytical toolkits or risk mitigation strategies to analyze and identify useful patterns from this genomic data in order to effectively predict and manage food safety risk of REP pathogens to improve public health. This proposal aims to develop novel tools and pipelines to analyze and predict the presence and behavior of REP pathogens under the various conditions observed in the agricultural and food ecosystem. We will leverage publicly available genomic and phenotypic data for REP pathogens from food production and processing environments, metagenomics data from sampling activities, and publicly available environmental data to delineate molecular markers associated with the persistence and evolution of REP pathogens. We will employ a combination of machine learning, bioinformatics analysis and advanced computational models to achieve our objectives. Our developed models and tools will be utilized to develop a reproducible pipeline to identify genetic determinants of pathogen persistence and recurrence in the food and agricultural domain. This in turn will assist in making better risk management decisions to improve food safety and protect public health from REP pathogens.
Animal Health Component
0%
Research Effort Categories
Basic
20%
Applied
80%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
7124010104025%
7124010208025%
7124010209020%
7121430110015%
7123299110015%
Goals / Objectives
The overall goal of this project is to develop novel methods, pipelines, and tools to predict genomic risk factors for food safety-related outcomes, such as microbial presence/persistence and interplay with other microorganisms in the agricultural and food processing ecosystems. This project aims to improve the safety of our food supply by developing a reproducible, easy-to-use analytical pipeline to detect and predict the presence and behavior of recurring, emerging, and persistent (REP) pathogens using novel computational methods. Specifically, we will use machine learning (ML), computational methods, and the latest experimental techniques to achieve the following objectives. Objective 1: Develop bioinformatics pipeline to identify genetic patterns in foodborne pathogens associated with changes in environmental conditions. Objective 2: Develop ML-based predictive model to detect pathogenic microorganisms in microbiomes (metagenomes) from food production environments. Objective 3: Demonstrate real-world applications of the computational and modeling pipeline by predicting microbial persistence in sustainable-leaning farm and processing environments. Objective 4: Launch artificial intelligence (AI)-based dashboard to detect the genetic patterns of foodborne pathogens present in microbial communities within the food production ecosystem.
Project Methods
In order to successfully achieve the objectives proposed in our project, we will employ an integrated approach. Our multidisciplinary team brings together state-of-the-art techniques, including machine learning, computational and bioinformatics tool-building, and current molecular diagnostics and statistical approaches. These will assist in our proposed research activities, including REP pathogen identification, pattern recognition, and prediction based on prevalent environmental factors, to support improved foodborne disease risk management. Objective 1: In this objective, we aim to characterize the genetic changes that could be indicative of REP persistence by analyzing whole genome sequencing (WGS) data using bioinformatics analytics and ML. Genome data for REP pathogens such as Salmonella, Listeria, and E. coli for poultry and leafy greens will be obtained from publicly available databases (e.g., the US National Center for Biotechnology Information's (NCBI) Pathogen Detection and the U.S. FDA's GenomeTrakr). We will develop a reproducible pipeline for the genome assembly, annotation, and bioinformatics analysis followed by ML-based predictive modeling to define the impact of external variables on REP genomes. Objective 2: Microbiome analysis will be done on the shotgun sequenced samples using both k-mer-based and alignment-based methods. Machine learning classifiers that are suitable for microbiome data will be employed to predict microbial (Salmonella or E. coli) presence or absence in environmental samples from the chicken processing or leafy green ecospheres. Hyperparameter tuning will be performed with standardized methods such as 10-fold cross validation. Model performance will be assessed using standard evaluation metrics. Objective 3: Sequence data will be collected from swab samples taken during routine surveillance activities and reclaimed water-related scientific studies. Samples will be collected from sites involved in leafy greens production, soil, reclaimed water and water disbursement devices, and from microorganism-seeded leafy greens. Shotgun metagenomic sequencing will be performed using standard methods to identify pathogen presence at strategic temporal increments in the reclaimed water-irrigated leafy greens ecosphere. This data would help us understand the microbial profile that supports the growth of pathogens in this environment. Objective 4: We will develop an interactive dashboard to interpret and analyze the various data streams and employ the advanced analytical methods developed in the previous objectives 1-3. This will help us apply uniform and objective criteria for defining pathogen surveillance and persistence.