Food Science & Human Nutrition
Non Technical Summary
Fresh produce has been repeatedly linked to high-profile foodborne disease outbreaks, leading to illness, loss of life, significant economic loss, and erosion of consumer confidence. Conventional methods for risk assessment have attempted to model important events leading to produce contamination by human pathogens. However, the critical need for early identification of emerging produce safety risks and early warning to the public has not been met. In this age of the Internet of Things (IoT), the use of the Internet, especially real-time social media and its rapid proliferation and dissemination, and the emergence of game-changing big data technologies have provided unprecedented opportunities to detect emerging produce safety issues and alert the public at an early stage. The overall goal of this study is to develop an innovative big data analytics infrastructure for fresh produce safety risk prediction and early warning based on cyber-informatics technologies that exploit multi-source big data, including social media, news media, and government reports, to reduce the incidence of foodborne diseases associated with consumption of fresh produce. The specific objectives include to: 1) Develop a real-time data retrieval mechanism to extract relevant information from diverse digital on-line sources, 2) Design big data storage fusing risk pattern data sets, 3) Discover event patterns about safety risks in fresh produce chains, 4) Design machine learning models for predicting outbreaks early, and 5) Implement a web-based early warning interface for stakeholders to visually explore levels of risks.
Animal Health Component
Research Effort Categories
Goals / Objectives
The goal of this study is to develop an innovative big data analytics infrastructure for the modeling of fresh produce safety risks and the early warning of fresh produce safety outbreaks. The resulting infrastructure applies state-of-the-art cyber-informatics technologies that leverage multi-source big data, including social media, news media, and government reports, to reduce the incidence of foodborne diseases associated with the consumption of fresh produce.
The big data analytics infrastructure called ESP (for Early Warning System for Fresh Produce)will be developed composed of several core technologies. A data collector retrieves information related to foodborne outbreaks from digital online sources by identifying relevant posts and reports and then extracting structured properties from these unstructured posts (Objective 1). The project will work with a rich variety of data sources, including social media, news outlets, and authoritarian web sites from the CDC, FDA, USDA, and other local and state government organizations. As a foundation for extracting relevant reports from these sources,we built an initial lexicon for food safety risks. This lexicon developed by the PI's group will be refined by deploying human labelers using a crowd sourcing study with Mechanical Turk for establishing the ground truth and by employing deep learning to generalize the food safety vocabulary. Leveraging this lexicon, deep learning models such as RNNs and Transformers will then be used to extract relevant food safety incidents from the identified digital data sources. The extracted data will be uploaded into our big data server (Objective 2). The big data server will be based on a spatiotemporal data model that captures the extracted food safety related incidents indexed by location and time of occurrence. Periodically, these data extractors are run to extract, clean, and upload additional incident data into our integrated data repository for analysis. We will optimize the processing and data management strategies of this data server, as needed, to assure practical performance of the system is achieved as the database grows. For identifying potential risks of fresh produce safety outbreaks based on this integrated data store, state-of-the-art machine learning techniques will be applied on this data, including both unsupervised methods (Objective 3) and supervised methods (Objective 4) that learn models for capturing food safety risk and for predicting food safety outbreaks. Lastly, a web-based interface to our data server will support stakeholders to visually explore levels of risks and outbreaks predicated by our ESP infrastructure (Objective 5).