Source: DELAWARE STATE UNIVERSITY submitted to NRP
FACT: HARNESSING DATA SCIENCE TECHNIQUES IN FOOD SCIENCE RESEARCH
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1025610
Grant No.
2021-67022-34148
Cumulative Award Amt.
$199,900.00
Proposal No.
2020-08812
Multistate No.
(N/A)
Project Start Date
Mar 1, 2021
Project End Date
Jul 31, 2024
Grant Year
2021
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
DELAWARE STATE UNIVERSITY
1200 NORTH DUPONT HIGHWAY
DOVER,DE 19901
Performing Department
Human Ecology
Non Technical Summary
The application of data science has gained prominence in some disciplines but food research and development. Current advances and trends point to novel data tools, analytical approaches, and technologies in addition to traditional methods to handle complex scenarios. In food science research, these scenarios are amplified by the vast assortment of food choices consumed in varying quantities, combinations, by consumers in different regions, and stages in life. In addition to this wide variability, current commonly used methods to navigate and interpret this data remain mostly cumbersome and time-consuming. The proposed interdisciplinary project seeks to formulate and use integrated machine learning (ML) approaches, and quantitative analysis concepts in food research. This project leverages domain-specific knowledge and data-driven approaches to aid food product and process design, consumer health, and safety to build and maintain optimized models to accelerate discovery and innovation.This project uses innovaive data science approches in food science research, education and training and equip bothstudents and faculty with computational thinking skills as well as skills to make inferences from data collected. This project pursues the following objectives; (1) develop an approach for working on streaming data in food research and use selected ML algorithms, mainly clustering and predictive analytics to explore specified scenarios; (2) identify appropriate supervised learning tools; (3) develop methods of identifying distributional characteristics of various variables and; (4) train students and foster faculty professional development with competencies in data science. We present as examples application of data-driven approaches to four food chemistry research case studies to spark new directions. The food industry, government, and other stakeholders stand to benefit from working cross-functionally to unlock opportunities that support data-driven and scientific decision making in food research and development and policy.
Animal Health Component
60%
Research Effort Categories
Basic
40%
Applied
60%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
5015010200065%
9011899208035%
Goals / Objectives
The goal of this project and collaboration between DSU and University of Delaware (UD) is to integrate the application of data science in food science research and training to aid process design and product development and to provide students and faculty with cross-disciplinary educational and professional development and hands-on learning opportunities in real-life food science and data science projects that can be incorporating in the latter's teaching.
Project Methods
Sub-project #1 will be aimed at assessing the impact of nutrient (sodium, total fatty acid, saturated fatty acid (SFA), and total sugar) variation on intakes.Hundred popular ice cream flavors (both store and name brands) will be analyzed for these nutrients duringlaboratory analyses. We will also use data from databases, manufacturer's websites and online resources, and categorized ice cream based on their flavors and macronutrients of interest using ML approach such as PCA and clustering. This sub-project will lead to the development of a comprehensive database that compares nutrition facts label and address gaps in information, monitor nutrient content at the brand level, and potentially influence consumer behavior and an overall reduction of these components.Sub-project #2 will assess the impact of dietary change/consumption of baked products, or snacks made from legumes/pulses on nutritional intake and quantify possible impact on consumers' health using consumption surveys data, composition and amount consumed to model nutritional intake.Sub-project #3Margarine formulated in our lab using unconventional oils and fats (njangsa seed, bush mango, and palm kernel oil) in addition to individual manufactured and restaurant foods sold in the US likely to contain TFA will be obtained and the fatty acid composition experimentally determined in our lab (Food Chemistry Lab, DSU). Changes in the content of TFA, SFA, cis unsaturated fat and total fat will be reported. Product development can be time-consuming; data science tool can be applied to streamline these efforts.Sub-project #4 We will develop a model that will predict stability of recovered oils stored over 6 months at 4 and 25oC based on experimental data of selected parameters and estimate the probability associated with this prediction

Progress 03/01/24 to 08/01/24

Outputs
Target Audience:The results from our work was shared among peers, colleagues, and professionals from various organizations during a conference. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project provided hands-on mentoring opportunities for students (high school and undergraduates) to attain improved proficiency in working with secondary data, ML tools and application. How have the results been disseminated to communities of interest?Our outreach activities includedimproving students (high school and undergraduate students) and industry personnelunderstanding and increasing interest in leveraging ML in several facets including Food Science research and the opportunity for various careers in science. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The project provided an opportunity to train one master's students who has successfully completed the program and currently pursuing a Ph.D. degree. The project providedoptimal experiences forhigh school and undergraduates who will potentially purse advanced degreesor employmentin the field of Food and Ag. Science. The project provided underrepresented studentsaccess and exposure to research opportunities to pursue advanced degrees, and faculty withprofessional development opportunities for teaching and research. Faculty collaborations in Food Science and Data Sciencewere also strengthened ensuringmultidisciplinary research opportunities for both students and faculty.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2024 Citation: 1) Aryee, A.N.A., Tawiah, N.A. and Tachie, C. Classification and authentication of edible oil-based products: A comparison of machine learning algorithms combined with FTIR spectroscopy. 2024 ACOS Annual Meeting & Expo. April 28 - May 1, 2024, Montr�al, Quebec, Canada.


Progress 03/01/21 to 08/01/24

Outputs
Target Audience:The project recruited and trained undergraduate and graduate students and establishednew collaborations. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The projectprovide results that have a significant social impact. The approach can replicate the looks of different foods and address health-related issues connected to the product's contents. This approach is more critical for low-income areas where the population has little knowledge of the food they are consuming. Health professionals can also use the current approach in addressing nutrition guidance for different populations.

Publications

  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Aryee, A. N., Tachie, C., & Kaleda, A. (2024). Formation of volatile compounds in salt-mediated naturally fermented cassava. Food Chemistry: X, 102101.


Progress 03/01/23 to 02/29/24

Outputs
Target Audience:The current research presents a machine learning (ML) framework in a sub-area of food science. The project has introduced students to the application of ML tools. Some research results have already been introduced in the curriculum and shared with the broader community. The project will enable students to pursue more classes and training in ML applications across various agriculture projects and research. In the second year of our project, we expanded our target audience to include new collaborators from academia and student trainees (2 graduate students and two undergraduates). The group published six manuscripts in peer-reviewed journals and attended three conferences. Student participants attended conferences that provide them access to a diverse range of sessions, workshops, and presentations delivered by industry experts. By attending these sessions, students were able to enhance their knowledge and stay updated on the latest trends, research, and best practices in their field. The conferences offered a platform to connect and network with peers, colleagues, and professionals from various organizations. Building new relationships and expanding professional networks can lead to collaborations, mentorship opportunities, and future career prospects. The conferences included hands-on workshops and training sessions that focus on developing specific skills or learning new tools and techniques in AI and Data Science. These interactive sessions helped the project team and students to acquire practical skills that can be applied in their work. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Publication Support: Guidance and assistance were provided in editing and structuring research publications, ensuring that they met academic writing standards. The objective was to enhance clarity, coherence, and readability, thereby helping students effectively communicate their research findings. Computational Technology Utilization: Assistance was provided for utilizing computational tools relevant to data science projects. This facilitated students' proficiency in data analysis and visualization, enabling them to effectively analyze and interpret complex data sets. Proofreading and Editing: Proofreading services were offered to students, providing them with constructive feedback on grammar, syntax, and structure. This helped in enhancing the quality and coherence of their written work. Additionally, we reviewed and refined technical content to ensure accuracy and precision in conveying data science concepts. Impact: Empowered students with the essential skills and knowledge crucial for effective publication writing, computational analysis, and machine learning implementation. Guidance and support enhanced students' confidence in utilizing computational tools and algorithms, fostering a deeper understanding of data science methodologies. Additionally, we facilitated the improved communication of data science concepts by providing effective proofreading and terminology clarification. Conclusion: Through our assistance and guidance, we were able to enhance the students' capabilities in various aspects of data science research and publication. Our objective was to equip them with essential skills in publication writing, computational technology, machine learning algorithms, proofreading, and terminology, thereby strengthening their proficiency and readiness to succeed in their data science endeavors. The conferences bring together professionals from different backgrounds and perspectives. Attending presentations and discussions can expose individuals to innovative ideas, approaches, and solutions that they can apply to their own work. Participants had the opportunity to present their own work through poster sessions, oral presentations, or panel discussions. These hands-on activities improved participants presentation skills, boost confidence, and provide valuable feedback from peers and experts. The participants engaged in presenting research findings and leading discussions that enhanced their skills and professional visibility within their work. It helped establish credibility and recognition as a thought leader in the field. The conferences served as a platform for finding potential collaborators and partners for research projects, initiatives, or business ventures. Building connections with like-minded individuals can lead to fruitful collaborations in the future. How have the results been disseminated to communities of interest?We have disseminated the results of our work through: (1) six peer-reviewed papers published in reputable journals and (2) three public presentations to a diverse audience at professional conferences over this project period, describing the aims of integrating data science techniques in food science research and what we have achieved so far. The project team have provided opportunities for connecting our research to some middle and high school students at the Early College High in Dover. One of the CO-PIs presented to environment engineering students in a seminar, sharing the project results. The seminar shows that the ideas presented in the project can be replicated in other subject areas, including environmental engineering and science. What do you plan to do during the next reporting period to accomplish the goals?We plan to convene two speaking opportunities on novel uses of ML/AI in food system. We also plan to on-board another graduate student and an undergraduate who will be partially supported with this grant. We will also include the reviews of the evaluator to refine initial plans for future grant proposals and summarize the ML/AI areas explored which are critically needed to advance Food science research and application of AI for sustainable product development. We will continue with the development and refining of curriculum modules for high school and undergraduates in various modalities. We are also planning to validate the results using out-of-sample data.

Impacts
What was accomplished under these goals? The project collaborated across three main themes including 1) determining the ability of ML algorithms including artificial neural networks (ANNs), decision trees (DTs), k-nearest neighbors (KNNs), and support vector machines (SVMs) to assess the variability in fatty classes (SFA, MUFA, and PUFA) in US- snacks consumed over a selected period. These approaches proved to be time-efficient and cost-effective to predict the nutritional value of the snacks. 2). assessing the combined utility of ATR-FTIR spectroscopy and ML techniques to identify and classify pure njangsa seed oil (NSO), palm kernel oil (PKO), coconut oil (CCO), njangsa seed-palm kernel oil (NSOPKO) and njangsa seed-coconut oil (NSOCCO) margarine. Additionally, it strove to quantify the degree of adulteration in each oil and margarine using ML regression models and sunflower oil and canola-flax seed oil margarine as adulterants. This combined use of FTIR spectroscopy and ML techniques to create models and demonstrated the qualitative classification of pure oils and predictions of adulteration in oils and margarine. PCA was integrated with the ML methods to increase classification accuracy for pure and adulterated oils and margarines by selecting the features that described most of their variance and summarized them into two PCs, which ensured efficient sample segregation. The FTIR spectroscopy technique avoided the need for laborious and complicated sample preparation making it a rapid and simple method for analysis. The demonstrated fingerprinting method suggests that ML methods in conjunction with FTIR spectroscopy can reliably classify and quantify adulterants in oil and margarine and could be further improved for applications in quality control settings to quickly authenticate new products, and 3). Predicting the quality of foods and beverages formulated with plant-based ingredients using Nutri-score and ML techniques. ML techniques were used to connect different datasets to insights about the quality of plant-based foods. Faster determination of these nutrients in foods through these models could promote intervention strategies by regulatory bodies to generate new or combined ingredients which can minimize calorie intake from snacks by consumers. It will increase awareness of the healthiness of different foods and cater to consumers' demand for personalized nutrition. Deep learning concepts could be developed for other foods that rely on tedious analytical/instrumentation methods to save time and minimize waste. Collaborating with a multidisciplinary team on implementing ML and AI techniques and the availability of a wide range of open-source dataset on food quality and processing to be evaluated ML techniques for developing predictive models for food product developers and producers. This process was guided by our multi-disciplinary leadership team who provided guidance on strategy, knowledge elicitation, and data collection needs, data gathering, gap analysis, and pivot decision point. One graduate student thesis supervised on integrating ML techniques on primary and secondary data on food quality and food authentication. The students used mined data from the National Health and Nutrition Survey (NHANES) and data collected from a study using FTIR spectroscopy to develop ML models. The project results provide results that have a significant social impact. The approach can replicate the look of different foods and address health-related issues connected to the product's contents. For instance, this approach is more critical for low-income areas where the population has little knowledge of the food they are consuming. Health professionals can also use the current approach in addressing nutrition guidance for different populations.

Publications

  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Tachie, C., Nwachukwu, I.D. and Aryee, A.N.A. (2023). Trends, and innovations in the formulation of plant-based foods. Food Production Processing and Nutrition, 5, 16; https://doi.org/10.1186/s43014-023-00129-0
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Tachie, C., Tawiah, N.A. and Aryee, A.N.A. (2023). Using machine learning models to predict the quality of plant-based foods. Current Research in Food Science, 7: 100544; https://doi.org/10.1016/j.crfs.2023.100544
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Tachie, C.Y.E., Obiri-Ananey, D., Tawiah, N.A., Attoh-Okine, N. and Aryee, A.N.A. (2023). Machine learning approaches for predicting fatty acid classes in popular US snacks using NHANES data. Nutrients, 15(15): 3310; https://doi.org/10.3390/nu15153310
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Wei, Z., Ananga, A., Ukuku D.O. and Aryee A.N.A. (2023). High salt concentration affects the microbial diversity of cassava during fermentation as revealed by 16S rRNA gene sequencing. Fermentation, 9(8): 727; https://doi.org/10.3390/fermentation9080727
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Tachie, C.Y.E., Obiri-Ananey, D., Alfaro-Cordoba, M., Tawiah, N.A. and Aryee, A.N.A. (2024). Classification of oils and margarines by FTIR spectroscopy in tandem with machine learning. Food Chemistry, 431: 137077; https://doi.org/10.1016/j.foodchem.2023.137077
  • Type: Journal Articles Status: Accepted Year Published: 2024 Citation: Tachie, C., Onuh, J.O. and Aryee, A.N.A. Nutritional and potential health benefits of fermented proteins. Journal of the Science of Food and Agriculture, 104(3): 1223-1233; https://doi.org/10.1002/jsfa.13001


Progress 03/01/22 to 02/28/23

Outputs
Target Audience:The project targets underrepresented and underserved communities and majors/disciplinesnot traditionally associated with data science. The project provided science-based knowledge duringclassroom instruction, laboratory instruction, or practicum experiences; development of curriculum or innovative teaching methodologies; internships; workshops; experiential learning opportunities; and outreach. Changes/Problems:A major problems was the delay in on-boarding studentsdue to COVID-19. What opportunities for training and professional development has the project provided?Training activities: Both undergraduates and graduate students are benefitting from with faculty and collaborators serving as mentors at various stages of the project. Professional developmentactivities: Faculty has participarted inworkshops, seminars, study groups, and individual study to increase knowledge such as theSMART-DART: Health Equity:SupportingMinority andRegionalTraining inData &AI forResearchers ofTomorrow: Health Equity Cohort. How have the results been disseminated to communities of interest?The results has been disseminated through presentations and manuscripts. What do you plan to do during the next reporting period to accomplish the goals?We plan torevist the regression analysisusing the results from both the EDA and some of the ML algorithms,implementnovel classification scheme and categorical variables encoding of open-sourced data on food and nutrition. We also plan to complete and submit other manuscripts for peer-review.

Impacts
What was accomplished under these goals? We have incoporated machine learning techniques in studenttraining and provided faculty opportunities to develop data science approaches for food science research through training. Exploratory Data Analysis (EDA) with experimental and open-source data (National Health and Nutrition Survey (NHANES)) data Regression analysis of selected variables performed. Prior to performing the analysis, the covariates were carefully selected based on their significance. Multi-class classification system for commonly consumed snacks in the US using ML algorithms such as support vector machine (SVM), decision tree (DT), light gradient-boosting machine (LightGBM), K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), artificial neural networks (ANN) to predict the amount of the fatty acid classes in these snacks K- cross-validation to train and test data sets Model complex non-linear data sets by incorporating interactions between sparse matrices and nutritional variables like fatty acids and snacks to find non-linear relationships between the outcomes that conventional regression models might miss Assessing discrimination between authentic and adulterated oils and margarines using a combination of FTIR and chemometric (machine learning algorithms)

Publications

  • Type: Conference Papers and Presentations Status: Submitted Year Published: 2023 Citation: Tachie, C., Attoh-Okine, N.O., Alfaro-C�rdoba, M. and Aryee, A.N.A. Application of FTIR spectroscopy and chemometrics for the authentication of oils and margarines (Submitted for presentation at the 2023 AOCS Annual Meeting & Expo 01/13/2023)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2023 Citation: Aryee, A.N.A. Multidisciplinary opportunities in Food Science Research. Tuskegee University, Seminar in Food and Nutritional Sciences, January 25, 2023
  • Type: Journal Articles Status: Awaiting Publication Year Published: 2023 Citation: Tachie, C., Nwachukwu, I.D. and Aryee, A.N.A. Trends and innovations in the formulation of plant-based foods. Food Production, Processing and Nutrition. DOI: 10.1186/s43014-023-00129-0 (Accepted 12/26/2022)
  • Type: Journal Articles Status: Submitted Year Published: 2023 Citation: Tachie, C., Attoh-Okine, N.O., Tawiah, N.A. and Aryee, A.N.A. Predicting Fatty Acid Classes in Popular US Snacks Using NHANES Data and Machine Learning Approaches (Submitted to Bioengineered 12/07/2022)
  • Type: Journal Articles Status: Under Review Year Published: 2023 Citation: Tachie, C., Attoh-Okine, N.O., Tawiah, N.A. and Aryee, A.N.A. Combination of FTIR spectroscopy and machine learning for non-destructive product identification (To be submitted to Journal of Food Control)


Progress 03/01/21 to 02/28/22

Outputs
Target Audience:For the first period of performance of this grant, which is being implemented here at DSU and UD, we have submitted an abstract to the Association of Research Directors (ARD) and a review manuscript is in preparation. Changes/Problems:The project has enhanced/intensified its linkage with our partner at UD and other faculty are now expressing interest. What opportunities for training and professional development has the project provided? The idea of using Data Science and Machine Learning to Food Science was presented to undergraduate students at the College of Agriculture at the University of Delaware A graduate student is developing a GitHub Site for Data Science and ML application in Food Science How have the results been disseminated to communities of interest? Preliminary results were presented at the School of Agriculture, UD Some results with be presented during ARD What do you plan to do during the next reporting period to accomplish the goals? Additional data will be collected from food databases and the literature Additional Exploratory Data Analysis to e preformed on data On-board another graduate student

Impacts
What was accomplished under these goals? Data was collected from food databases and the literature Exploratory Data Analysis was completed on the data Preliminarily remarks were presented based on the results Graduate student at UD presented the preliminary results during Graduate Seminar

Publications

  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Asuzu, P.C., Trompeter, N.S., Cooper, C.R., Besong, S.A. and Aryee, A.N.A. (2022). Cell culture-based assessment of toxicity and therapeutics of phytochemical antioxidants. Molecules 2022, 27, 1087.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Aryee, A.N.A., Akanbi, T.O., Nwachukwu, I.D. and Gunathilake, T. (2022). Perspectives on preserving lipid quality and strategies for value enhancement. Current Opinion in Food Science, 44: 100802.
  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: Pre-processing Treatments Improved the Physicochemical Properties of Bambara Groundnut Flours and Preference of Formulated Cake
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Enhancing Oxidative Stability and Delivery of Njangsa (Ricinodendron heudelotii) Seed Oil by Encapsulation
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Machine Learning Approaches to Predict Micronutrients Content in Plant-based Foods
  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: Profiling Carotenoids and Other Bioactives in Selected Starchy Staples
  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: In vitro Assessment of Efficacy and Cytotoxicity of Prunus africana Extracts on Prostate Cancer C4-2 Cells