Source: California State University, Fullerton submitted to NRP
LEARNING TO ANALYZE EMERGING DATASETS FOR AGRICULTURE: UNDERSTANDING FOOD WASTE BEHAVIOR FROM SOCIAL MEDIA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1020890
Grant No.
2019-38422-30211
Cumulative Award Amt.
$179,178.00
Proposal No.
2019-03819
Multistate No.
(N/A)
Project Start Date
Sep 1, 2019
Project End Date
Feb 28, 2023
Grant Year
2019
Program Code
[NJ]- Hispanic Serving Institutions Education Grants Program
Recipient Organization
California State University, Fullerton
800 N. State College Blvd., Room KHS-121
Fullerton,CA 92831
Performing Department
Computer Science
Non Technical Summary
The future of agriculture will be data-driven. Most undergraduate academic programs include the analysis of structured data (data that is tabular). However, emerging datasets are less structured. These datasets are from a variety of sources, including online social networks such as Twitter. The proportion of unstructured data is expected to increase. Such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior regarding food waste. Food waste is a problem that gets little attention. Nearly 29% of the food produced in the USA is not eaten. As hunger is a significant problem in many parts of the world, including the United States, food waste is an issue that must not be ignored. In addition, food waste entails problems with greenhouse gas emissions, water use, and health. The reasons why food waste occurs vary widely. One reason in the United States is the perception that food is no longer safe after the stamped date on the product. As increasing numbers of people express their opinion on social media networks, social network communications can be expected to reflect the true food waste perceptions and behaviors held by the users. This source of information can be systematically studied to understand specific reasons for food waste. Moreover, study of communication patterns can help food waste reduction advocates design more effective social media-based campaigns.The project will develop and offer co-curricular training on analyzing unstructured data in food, agriculture, natural resources and human (FANH) applications. The training will be structured as an undergraduate research experience with the task of understanding food waste behavior from online social network messages. Students will be assigned specific research tasks with detailed instructions which can be completed in their own time. The primary goal is to train the students to become skilled in working with unstructured data. The training will be interdisciplinary and include both computer science and biological sciences students. In this collaborative setting, which is increasingly reflective of real-world work flows, computer science students will demonstrate the computational steps to biological sciences students; the biological sciences students will bring the domain knowledge required to complete the research tasks. A secondary goal is to educate all students of emerging FANH data-driven applications and career opportunities.The training will be offered to undergraduate students at California State University, Fullerton where over 40% of the students are Hispanic. At least 50% of the participants are expected to apply to FANH positions. Training materials will be released so that the program can be adapted and replicated elsewhere. Such training will provide specialized computing skills essential to work on data-driven agriculture.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
90350102080100%
Goals / Objectives
The primary goal of the project is to train undergraduate students to become skilled in working with unstructured datasets. A secondary goal is to educate these students of emerging FANH data-driven applications and career opportunities.The objectives of the project are:1. Provide training in the processing of unstructured text data for FANH applications to undergraduate students. At the end of a semester-long program, students should be able to analyze a new unstructured text dataset for an FANH-related application. This objective directly supports Objectives 1 and 3 of the HSI Education Grants Program since this will provide training to students, including the over 40% Hispanic students at CSUF, in a skill that is increasingly important to FANH applications.2. Develop teaching material for working with unstructured text data that can be freely re-used in subsequent years and in other settings outside CSUF. The teaching material will be self-contained and is designed to be adapted to the interests and expertise of other data analysis instructors. This objective supports Objective 2 of the HSI Education Grants Program since the teaching material that will be produced can be used to improve the data analysis instruction provided as part of an FANH education.3. Understand the beliefs, actions, and business practices leading to food waste by analyzing social media data, by means of an automated methodology. Though this objective does not directly support any of the objectives of the HSI Education Grants Program, measuring its outcome will validate the usefulness of the skill (analyzing unstructured text data) to FANH sciences.
Project Methods
Efforts:1. Formulate the steps for unstructured text processing as a sequence of detailed research tasks suitable for undergraduate students with background in only data statistics: We will use the computer science techniques of Topic Modeling and Sentiment Analysis to quantify the opinions expressed in an online message regarding food waste beliefs and behaviors and to aggregate such attitudes across all messages. Text data from relevant online social media will first have to be harvested using software. The dataset will be filtered to retain only those messages which are related to food waste. This dataset will then be input to topic modeling algorithms which can identify statistically co-occurring groups of words (the topics), each related to a single aspect of food waste. The snippets of text data relating to each of these aspects is then input to a sentiment analysis algorithm which will quantify the strength and polarity expressed by the online community towards these aspects. The aggregate sentiments towards each of these discovered aspects (e.g., expiration date stamped on a food item) will be cross-referenced with location information.2. Identify text sources describing factors leading to food waste: We will perform a comprehensive literature review of published material describing the various factors (including beliefs, behaviors, and business practices) contributing to food waste. The purpose of this activity is to collect a set of textual sources describing food waste factors and which can eventually be used to categorize social media message into discussions of specific factors (Activity 3). Sources of descriptions of causes of food waste will include both academic publications, news media, and blogs.Advertise the program to students in computer science and biological sciences and recruit students to the program: Students will be enrolled in the training program in cohorts of 25 students each - each lasting one semester (15 weeks). We expect the majority of the students (approximately 15 of the 20) to be from computer science. The remaining students will be recruited from the biological sciences.3. Conduct the research experience-based training program over four semesters: The students will be organized in 4-5 inter-disciplinary groups. Each group will contain at least one "domain-expert" student. Each group will be assigned one specific task of the research process. This will begin with setting up their computing environment (the R programming environment). More complex steps will follow, including downloading data, filtering data, and text data analysis. Each of these steps will have to be performed with varying amounts of computer programming. All students in a group are expected to participate in computer programming. It is expected that computer science students will explain some of the coding steps to the non-computer science students. Similarly, computer science students will be expected to understand the content of the data; we expect that the biological sciences students will explain these to the rest of their group. At the end of the semester, each group will be expected to prepare a presentation of their research findings.4. Coordinate with the College of Engineering and Computer Science's Center for Academic Support in Engineering and Computer Science (CASECS) to provide support and space to participating students: Undergraduate students already use the CASECS space (room CS 201) as a study area. Project students will be encouraged to use CS 201 to work on their research tasks. The offices of the student advisors, tutors, and counselors are also in the same space. The students will also be encouraged to utilize these resources to get help as needed and appropriate advising which is critical to student success.5. Use feedback from a cohort of students to make changes to the training program to be provided to the following cohorts: The PD will meet the student groups regularly throughout the semester and identify unexpected pain points. In addition, at the end of each semester, each student will be asked to complete a questionnaire about their experiences in the program. This critical feedback will be used to improve the teaching material for the following cohorts.6. Present the results of the training program at the HSI Education Grants Program's annual meeting: The PD will present the structure of the research experience-based training program, student learning effectiveness, and best practices at the annual meeting. Each student group will be asked to prepare a presentation at the end of their semester-long program describing their research findings. Four students from the previous year will be invited to attend the HSI Education Grants Program's annual meeting and present their work and relay their experiences.7. Archive the teaching material developed by the end of the project: The teaching material along with the organization details of the research experience-based training program will be submitted to the California State University's MERLOT (Multimedia Education Resource for Learning and Online Teaching) repository. Once archived, the MERLOT system displays the content in a web-accessible format. The content is also searchable by those looking for Open Educational Resources (OER) material for their own teaching purposes.Evaluation:A Knowledge Survey will be used as the primary means of evaluating the project. A Knowledge Survey is a detailed set of multiple-choice questions that are designed to be administered to students before and after a course. Students do a self-evaluation of their ability to answer each question. The "Unstructured data in FANH sciences" Knowledge Survey will include questions to quantify the student's knowledge of:Data analysis opportunities in FANH sciencesCareer opportunities in FANH sciencesUnstructured datasetsComputational skills (computer programming)I will design the Knowledge Survey before the first group of students begin their training. This Knowledge Survey will then be completed by all participating students at the beginning of their semester-long training program and at the end. Comparison of the pre- and post-training responses will be used to quantify the extent to which the training program developed students' interest in the FANH sciences and computational skills.I will collect participant numbers, and participants' academic and demographic data. This data will be used to compute the following metrics.Number of students enrolled in the programNumber of Hispanic students in the programNumber of underrepresented students in the program. This is the number of CSUF students that belong to the historically underrepresented groups in STEM - Hispanic students, African-Americans, Native Americans, and women.Number of female students in the programNumber of students who apply to FANH internships/jobsNumber of students who were offered FANH internships/jobsNumber of students who accepted FANH internships/jobsNumber of group projectsAverage GPA of enrolled studentsIn addition, I will qualitative methods to assess the project outcomes. Qualitative data will be obtained through focused group discussions with students who participate in the training process. Finally, online surveys will be administered to participants after graduating to gauge if their career choice is in an FANH-related area. For this qualitative data analysis, I will consult with CSU Fullerton's Social Science Research Center (SSRC). SSRC assists agencies and organizations to conduct methodologically sound studies to answer policy-relevant research questions.

Progress 09/01/19 to 02/28/23

Outputs
Target Audience:The main effort that was completed was the organization of three semester-long training programs for undergraduate students in the use of the R computer programming language for text processing. The programs were conducted in the Spring semesters of years 2020-2022. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: • Engineering and Computer Science • Natural Sciences and Mathematics • Humanities and Social Sciences • Health and Human Development 86 students were selected from these colleges to participate in the program. The majors were as follows: Biology (17) Public Health (16) Computer Science (15) Mathematics (9) Computer Engineering (6) Kinesiology (5) Psychology (4) Biochemistry (2) Chemistry (2) Child and adolescent development (2) Civil Engineering (2) English (2) Nursing (2) Accounting (1) Business (1) Electrical Engineering (1) History (1) Mechanical Engineering (1) Spanish teacher (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 54 of the students were female; 32 were male. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The PD conducted three training programs, each one semester-long, between the years 2020-2022 to teach text analysis using the R computer programming language, primarily to students with little computer programming experience. Unstructured datasets, including text, images, and videos sent over online social networks are increasingly important in understanding consumer behavior and will need to be analyzed for agricultural applications also. Each semester, about 30 students (total of 86 students) were selected based on their interest in careers in sustainability, in particular reducing food waste. The majority of the students were in majors outside the computing sciences and were unlikely to learn text analysis skills while at the university as part of their regular curriculum. The distinctive format of the training program was that students worked in small interdisciplinary groups that included students with both some prior programming experience and no programming experience. The students met with the PD and two student assistants every week to learn text data analysis topics and complete programming exercises. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. The group of students were exceptionally diverse. Students from 19 majors, across 4 colleges in the university, participated in the program. 63% of the students were female; this is in comparison to the approximately 16% of enrolled students who are female in computer science. The PD also worked closely with 8 students over the course of the project (2-3 students per year) on conducting research on automated methods to identify food-related messages on social media with the goal of generating targeted food waste reduction messages. The students were the main co-authors on peer-reviewed publications and poster presentations. One of the students completed her honors thesis on this research topic. All the student research assistants thus developed valuable research, teaching, and technical writing and presentation skills. How have the results been disseminated to communities of interest?Results from research conducted with undergraduate student on automatically analyzing social media messages to identify if a message is about food preparation so that a reply with relevant information from USDA's FoodKeeper database has been published and/or presented at these venues (* indicates students): 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2021. Title: "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media." Authors: E. Lee*, B. Chenze*, and A. Panangadan. [Peer-reviewed publication and conference presentation] 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2022. Title: "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." Authors: B. Chenze*, E. Lee*, and A. Panangadan. [Peer-reviewed publication and conference presentation] CSUF College of Engineering and Computer Science Student Project and Innovation Expo, 2023. Title: "Automatically Recognizing Mentions of Food in Social Media Posts for Targeted Messaging about Food Storage." Authors: A. Shah*, J. Bui*, B. Cortez*, and A. Panangadan. [Poster; winner of Best Computer Science Project award] University Honors Program thesis, California State University, Fullerton. Title: "Targeted Messaging about Food Storage in Social Media Posts." Author: L. Batla* [Honors Thesis] National Conference on Undergraduate Research (NCUR), University of Wisconsin-Eau Claire on April 13-15, 2023. Title: "Targeted Messaging about Food Storage in Social Media Posts." Presenter: B. Cortes* [Oral presentation] All teaching materials developed over the course of the project were available to students who participated in the semester-long training programs in a Canvas (Learning Management System) website. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Impact statement: The project funded a training program to teach computer programming-based text analysis skills to undergraduate students at CSU Fullerton. Unstructured datasets such as text messages over online social networks are increasingly important in understanding consumer behavior and will need to be analyzed for agricultural applications. Each semester, about 30 students (total of 86 students) were selected based on their interest in careers in sustainability. The majority of the students were in majors outside the computing sciences. The distinctive format of the training program was that students worked in small interdisciplinary groups that included students with both some prior programming experience and no programming experience. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. The students were exceptionally diverse. Students from 19 majors, across 4 colleges in the university, participated in the program. 63% of the students were female; this is in comparison to the approximately 16% of enrolled students who are female in computer science. The teaching material developed over the course of the project can be freely re-used in subsequent years and in other settings outside CSU Fullerton. The PD conducted research with 8 students over the course of the project to develop automated methods to identify food-related messages on social media. These methods will be the core component of a system that can automatically respond to social media messages about food with relevant information about food storage in an effort to reduce end-consumer food waste. The research resulted in two peer-reviewed conference publications, a student presentation at the National Conference of Undergraduate Research, a student's Honors thesis, and a poster at CSUF's annual College of Engineering and Computer Science Student Project and Innovation Expo. The poster won the best Computer Science project award in 2023. Objective 1: 1) Major activities completed / experiments conducted The PD conducted three semester-long training programs, one every Spring semester in years 2020-2022 as planned. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected All students used tweets collected from Twitter for their end-of-semester projects. 3) Summary statistics and discussion of results Over the 3 years, 86 undergraduate students participated in the program. The majors: Biology (17) Public Health (16) Computer Science (15) Mathematics (9) Computer Engineering (6) Kinesiology (5) Psychology (4) Biochemistry (2) Chemistry (2) Child and adolescent development (2) Civil Engineering (2) English (2) Nursing (2) Accounting (1) Business (1) Electrical Engineering (1) History (1) Mechanical Engineering (1) Spanish teacher (1) (Note: the total number of majors do not equal the number of students due to students having double majors.) 54 of the students were female. This compares to the approximately 16% of female students in the BS in Computer Science program at CSU Fullerton. 4) Key outcomes or other accomplishments realized 86 undergraduate students from different disciplines learned how to analyze text data using computer programming. Objective 2: 1) Major activities completed / experiments conducted The PD developed teaching material that is intended to teach students with little to no prior computer programming experience. This includes: PowerPoint slides, a set of programming assignments, solutions to the programming assignments, and sample datasets. 2) Data collected We identified publicly-available datasets and also collected our own data to be used as a teaching aid. These were: Collection of Twitter tweets that include keywords related to CSU Fullerton. Collection of Twitter tweets that include keywords related to sustainability. Food Loss dataset from the Food and Agriculture Organization (FAO). National Electronic Injury Surveillance System (NEISS) dataset. 3) Summary statistics and discussion of results The teaching material takes a student from the very basics of programming in the R language to the more specialized topic of analyzing text datasets. 4) Key outcomes or other accomplishments realized The project developed teaching material to introduce text data analysis that can be freely re-used in subsequent years and in other settings outside CSUF. This outcome supports Objective 2 of the HSI Education Grants Program since the teaching material that was produced can be used to improve the data analysis instruction provided as part of an FANH education. Objective 3: 1) Major activities completed / experiments conducted We researched methods to identify food-related messages on online social media and to extract the specific food-related words in each such message. These methods can be used to build an application to respond automatically to social media messages (a "chatbot") about food with relevant information about food safety and preparation in an effort to reduce food waste by end-consumers. Our method uses an open-source Natural Language Processing (NLP) library called spaCy and keywords from the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service. This approach was demonstrated for messages on Twitter and image posts on Instagram by using image object-detection algorithms. 2) Data collected The following datasets were used in this research activity. FoodKeeper application from USDA's Food Safety and Inspection Service. 39,000 food-related tweets from a publicly available collection of 1.6 million tweets. A publicly available dataset of Instagram influencers containing over 10 million posts, including image (https://sites.google.com/site/sbkimcv/dataset/instagram-influencer-dataset). Our research used only a subset of the posts that were labeled as food-related. 3) Summary statistics and discussion of results We evaluated the precision and recall of the entity recognition method on a hand-labeled subset of tweets. Our first method achieved a test-set accuracy of 73% with a precision of 0.96, and a recall of 0.52 (f1-score of 0.68). Our second approach, which used an iterative "snowball" algorithm, achieved a precision of 0.80, and a recall of 0.80 (f1-score of 0.80). Thus, this approach resulted in a significant increase of 0.28 in recall performance. We did limited testing on Instagram image posts. We used Microsoft's Computer Vision cloud-based API to recognize objects in images and return a text description of an image. Out of the 25 images in a randomly sampled set of Instagram posts of images, 14 had responses that were a true positive. 4) Key outcomes or other accomplishments realized We built the core component of a system that can automatically respond to messages about food with relevant information about food storage in an effort to reduce end-consumer food waste. We have made progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with reasonably high accuracy. Over the course of this project, this research activity included 8 students who were hired as research assistants. The research resulted in two peer-reviewed conference publications, a student presentation at the National Conference of Undergraduate Research, a student's Honors thesis, and a poster at CSUF's annual College of Engineering and Computer Science Student Project and Innovation Expo. The poster won the best Computer Science project award in 2023.

Publications

  • Type: Theses/Dissertations Status: Published Year Published: 2022 Citation: L. Batla, Targeted Messaging about Food Storage in Social Media Posts, University Honors Program thesis, California State University, Fullerton. 2022
  • Type: Other Status: Other Year Published: 2023 Citation: A. Shah, J. Bui, B. Cortez, A. Panangadan, Automatically Recognizing Mentions of Food in Social Media Posts for Targeted Messaging about Food Storage, Poster at 2023 California State University, Fullerton College of Engineering and Computer Science Student Project and Innovation Expo.


Progress 09/01/21 to 08/31/22

Outputs
Target Audience:The main effort that was completed this year was the organization of the third semester-long training program for undergraduate students in the use of the R computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 25 students were selected from these colleges to participate in the program. The majors were as follows: Public Health (7) Biology (5) Computer Science (4) Child and Adolescent Development (2) Kinesiology (2) Mathematics (2) Business (1) Chemistry (1) Computer Engineering (1) Electrical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The PD conducted the third installment of the semester-long training program to teach text analysis using the R computer programming language in Spring 2022. 25 undergraduate students participated in the program. The majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. In addition, two undergraduate students were hired as research/teaching assistants. These students assisted training program participants to complete their assignments and later participated in research to develop food entity recognition methods. How have the results been disseminated to communities of interest?Our research on the iterative method to identify food entities in social media messages has been published after peer-review at the 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI). The work was also orally presented at the conference: 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2022. Title: "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." Authors: B. Chenze, E. Lee, and A. Panangadan. The work was also presented as a poster at the annual CSU Fullerton College of ECS Student Projects Showcase. What do you plan to do during the next reporting period to accomplish the goals?In the last 6 months of the project, the following tasks will be performed. Continue research into methods for food entity recognition; evaluate the performance of this approach on the Instagram image-based posts in addition to text messages on Twitter. Make teaching material more visually attractive. Disseminate findings in venues inside and outside CSU Fullerton.

Impacts
What was accomplished under these goals? Impact statement: In the third year of this project, the PD repeated the semester-long training program to teach text analysis using a computer programming language. 25 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. As in previous years, students were selected primarily based on their interest in working on sustainability areas, rather than their prior experience with computer programming. As a result, the majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest. The PD conducted research with students to develop a method that can automatically identify when a social media message is about food. The goal of this research is to enable replies, containing tips on reducing food spoilage (from the FoodKeeper dataset provided by the USDA Food Safety and Inspection Service), to be automatically generated. The team improved upon the food entity recognition method that was developed in the previous year and results have been published in a peer-reviewed conference. Objective 1: 1) Major activities completed / experiments conducted The PD conducted the semester-long training program during the Spring 2022 semester as planned. This was the third time the program was offered to students. The program prioritized recruiting students based on their interest in sustainability and careers in this field rather than on their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. The sessions were held in-person after the first two weeks but students had the option to join over Zoom if they had concerns because of the pandemic. In each session, the instructor introduced a few topics related to computer data analysis using the R programming language. Each concept was taught in a "hands-on" manner - students completed programming assignments in small groups (in Zoom breakout rooms or with fellow students in the classroom). The PD and student assistants would move between groups/breakout rooms to answer questions and help students with the assignments. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected Most students downloaded tweets containing keywords related to sustainability (composting, electric vehicles), and the ongoing pandemic. 3) Summary statistics and discussion of results 25 undergraduate students were selected to participate in the program. The majors were as follows: Public Health (7) Biology (5) Computer Science (4) Child and Adolescent Development (2) Kinesiology (2) Mathematics (2) Business (1) Chemistry (1) Computer Engineering (1) Electrical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 4) Key outcomes or other accomplishments realized 25 undergraduate students from different disciplines learned how to analyze text data using computer programming. Two undergraduate students worked as research/teaching assistants. Research conducted with students on text processing resulted in a peer-reviewed publication and a presentation at the annual College of Engineering and Computer Science Projects Showcase. Objective 2: 1) Major activities completed / experiments conducted The PD continued to update the teaching material - a set of PowerPoint slides and a sequence of short computer programming assignments that are meant to be completed by a student while following the PowerPoint presentation. We made the following modifications to the teaching material: 1) added the Food Loss dataset from the Food and Agriculture Organization (FAO), 2) a new set of computer programming assignments that make use of the FAO food waste dataset, and 3) making the PowerPoint slides more visually attractive. 2) Data collected We identified a new publicly-available dataset, Food Loss dataset from the Food and Agriculture Organization (FAO), to be used as a teaching aid. 3) Summary statistics and discussion of results In addition to the previously collected datasets, the new FAO dataset enables students to work on programming assignments using data that is directly relevant to their interest in reducing food waste. 4) Key outcomes or other accomplishments realized The updated teaching material can be used to teach students, starting from the very basics of programming in the R language to the more specialized topic of analyzing text datasets. Objective 3: 1) Major activities completed / experiments conducted We continued development of methods to automatically identify social media messages about food. Our method will enable "food entity recognition", the computational task of identifying words or phrases in English text that correspond to specific foods. In the third year of the project, the PD and two undergraduate student research assistants extended the Snowball approach to enable recognition of keywords describing foods in unstructured text that is typical in social media. This approach uses a set of food-related keywords from the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service. The iterative approach starts with dataset messages that are most likely to be about some food. Likelihood is based on the number of keywords that appear in a message. We tested this approach by identifying food entities in messages on the Twitter network. We evaluated the accuracy of this method and published quantitative results in a peer-reviewed conference. In the third year of the project, we also looked into extending this method to image-based messages posted to the Instagram social media network, to see if it is possible to reliably identify food-related posts. 2) Data collected The following datasets were used in this research activity. FoodKeeper application from USDA's Food Safety and Inspection Service. 39,000 food-related tweets from a publicly available collection of 1.6 million tweets. A publicly available dataset of Instagram influencers containing over 10 million posts, including images (https://sites.google.com/site/sbkimcv/dataset/instagram-influencer-dataset). 3) Summary statistics and discussion of results We evaluated the precision and recall of the "Snowball" food entity recognition method on a hand-labeled subset of 80 tweets. The approach achieved a precision of 0.80, and a recall of 0.80 (f1-score of 0.80). This represents a significant increase of 0.28 in recall performance compared to our previous approach. We have also begun to quantify the accuracy of detection of food images in Instagram posts. 4) Key outcomes or other accomplishments realized We continued to build the components of a system that can ultimately be used to determine whether tweets are food-related. Successful development of such a system will enable relevant responses, containing tips for reducing food waste, to be automatically sent to the poster of the message. We continued to make progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: B. Chenze, E. Lee, and A. Panangadan. "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." In 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI), pp. 126-131. IEEE, 2022.


Progress 09/01/20 to 08/31/21

Outputs
Target Audience:The main effort that was completed this year was the organization of a semester-long training program for undergraduate students in the use of the R computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 31 students were selected from these colleges to participate in the program. The majors were as follows: Biology (8) Public Health (7) Computer Science (4) Biochemistry (2) Civil Engineering (2) Computer Engineering (2) Kinesiology (2) Mathematics (2) Chemistry (1) English (1) Mechanical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 20 of the students were female; 11 were male. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The PD conducted a semester-long training program to teach text analysis using the R computer programming language. 31 undergraduate students from diverse disciplines participated in the program. The majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Undergraduate students outside the computing sciences are unlikely to learn text analysis skills while at the university. However, online social networks contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. How have the results been disseminated to communities of interest?Our research on a method to respond automatically to social media messages about food with relevant information about food safety and preparation has been published after peer-review at the 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI). The work was also orally presented at the conference. E. Lee, B. Chenze and A. Panangadan, "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media," 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 263-266, doi: 10.1109/IRI51335.2021.00042. What do you plan to do during the next reporting period to accomplish the goals?In Year 3 of the project, the following tasks will be performed. Conduct a third installment of the semester-long text processing training program (in Spring 2022 semester) Continue research into food entity recognition Develop a proof-of-concept system to demonstrate how replies can be automatically generated with tips on reducing food spoilage for use on the Twitter social media platform Disseminate findings in conferences and symposia on food waste

Impacts
What was accomplished under these goals? Impact statement: In the second year of this project, the PD conducted a semester-long training program to teach text analysis using a computer programming language. 31 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. A majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Text processing is considered an "advanced" data analysis topic and students outside the computing sciences are unlikely to learn these skills while at the university. However, online social networks contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest. The PD and student researchers are also building the components of a system that can ultimately be used to determine whether social media messages are food-related in real-time. When a message is recognized to be related to food, a reply will be automatically generated with tips on reducing food spoilage using the FoodKeeper dataset (a dataset provided by the USDA Food Safety and Inspection Service). The team has made significant progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy. Objective 1: 1) Major activities completed / experiments conducted The PD conducted the semester-long training program during the Spring 2021 semester as planned. This was the second time the program was offered to students. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. All sessions were online over Zoom because of the pandemic, In each session, the instructor introduced one topic related to computer data analysis using the R programming language and interspersed the presentation with programming assignments conducted in small groups in Zoom breakout rooms. The PD and student assistants would move between breakout rooms to answer questions and help them with the assignments. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected All students used tweets collected from Twitter for the end-of-semester projects. Most students downloaded tweets containing keywords related to sustainability (composting, electric vehicles), and the ongoing pandemic. 3) Summary statistics and discussion of results 31 undergraduate students were selected to participate in the program. The majors were as follows: 1. Biology (8) 2. Public Health (7) 3. Computer Science (4) 4. Biochemistry (2) 5. Civil Engineering (2) 6. Computer Engineering (2) 7. Kinesiology (2) 8. Mathematics (2) 9. Chemistry (1) 10. English (1) 11. Mechanical Engineering (1) 12. Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 20 of the students were female; 11 were male. This compares to the less than 20% of female students in the BS in Computer Science program at CSU Fullerton. 4) Key outcomes or other accomplishments realized 31 undergraduate students from different disciplines learned how to analyze text data using a computer programming. Two students were hired as research/teaching assistants. One student was an undergraduate student in Computer Science; the other was a graduate student in Computer Science. Research conducted with these two students on text processing resulted in a peer-reviewed publication. Objective 2: 1) Major activities completed / experiments conducted The PD continued to update the teaching material. These include PowerPoint slides and a set of programming assignments that are meant to be completed by a student while following the PowerPoint presentation. The programming assignments use the built-in datasets in R and also use text data that is more current and relevant to a student's daily life, such as tweets about the university. 2) Data collected We added to the collection of datasets that could be used for the short programming assignments, student projects, and for research. The additional datasets were: Collection of Twitter tweets that included keywords related to a topic related to sustainability that was of interest to a student group, e.g., electric vehicles. A publicly available collection of 1.6 million tweets from 2009. The dataset has a large variety of tweets, and specifically has both food related and not food related tweets. 3) Summary statistics and discussion of results The large number and variety of tweets in the publicly-available tweets dataset enabled us to use this as both training and validation sets for entity recognition. To identify food-related words in a tweet, we made use of the keywords in USDA's Food Safety and Inspection Service FoodKeeper dataset (collected in the previous year). 4) Key outcomes or other accomplishments realized We now have a wider variety of teaching material to teach undergraduate students to analyze text data using the R computer programming language. The material includes more data specific to food waste applications. Objective 3: 1) Major activities completed / experiments conducted We are doing research to develop a method to respond automatically to social media messages about food with relevant information about food safety and preparation in an effort to reduce food waste by end-consumers. The main challenge is "entity recognition", the computational task of identifying words or phrases in natural language text that correspond to real-world objects such as foods. Current entity recognition methods can recognize only a relatively small set of entity types. We are developing a method to extend existing entity recognition systems to recognize foods without requiring a large labeled dataset by using the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service, and which contains information on a variety of foods. Our work is designed to enable recognition of novel entity types such as food from unstructured text that is typical in social media. 2) Data collected We used USDA's Food Safety and Inspection Service FoodKeeper dataset and the 1.6 million tweet dataset that was described in Objective 2. 3) Summary statistics and discussion of results We evaluated the precision and recall of the entity recognition method on a hand-labeled subset of tweets. The system achieved a precision of 0.80 and a recall of 0.80 (f-score of 0.80) on this dataset. Previous studies have reported a model that achieved a precision of 0.96, a recall of 0.52 and f1-score of 0.68. Our approach was able to achieve a 0.28 increase in recall performance. More than just the increased recall and f1-score, the model uses the context of food entities in tweets better than a baseline model. 4) Key outcomes or other accomplishments realized We are building the components of a system that can ultimately be used to determine whether tweets are food-related. When a tweet is recognized to be related to food, then an insightful response can then be posted informing the user of different ways of maximizing the longevity of that food. We have made progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: E. Lee, B. Chenze and A. Panangadan, "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media," 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 263-266, doi: 10.1109/IRI51335.2021.00042.


Progress 09/01/19 to 08/31/20

Outputs
Target Audience:The main effort that was completed this year was the organization of a semester-long training program for undergraduate students in the use of a computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two teaching assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 29 students were selected from these colleges to participate in the program. The majors were as follows: Computer Science (7) Mathematics (5) Psychology (4) Biology (3) Computer Engineering (3) Kinesiology (1) English (1) Spanish teacher (1) History (1) Accounting (1) Public Health (1) (Note: the total number of majors do not equal the number of students due to some students having double majors) 15 of the students were male; 14 were female. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? 29 undergraduate students from different disciplines learned how to analyze text data using a computer programming language. Two students learned of USDA activities and opportunities (e.g., internships) by attending the American Association of Hispanics in Higher Education (AAHHE) conference in Costa Mesa, California. Two students developed teaching skills by working as teaching assistants for the program One graduate student completed her MS thesis under the supervision of the PD One undergraduate student completed an Independent Study under the supervision of the PD. This student then went on to join a graduate program. How have the results been disseminated to communities of interest?A flyer was prepared advertising the benefits of participating in the text analysis training program. The flyer was vetted by Sergio Guerra, Director of the Center for Academic Support in Engineering & Computer Science (CASECS), CSU Fullerton. The flyer was distributed via the university's advisors email distribution system to all juniors and seniors in the colleges of Engineering and Computer Science, Natural Sciences and Mathematics, Humanities and Social Sciences, and Health and Human Development. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Impact statement: By the end of the first year of this project, the PD developed and conducted a semester-long training program to teach text analysis using a computer programming language. 29 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. Text processing is considered an "advanced" data analysis topic and students outside the computing sciences are unlikely to learn these skills while at the university. However, emerging datasets from a variety of sources, including online social networks, contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. This program thus provided students with an interest in careers in sustainability the skills to analyze this type of data. The students were selected based on their written interest in sustainable behaviors and careers in sustainability. Nearly two-thirds were from majors outside engineering and computer science and most of these students had little prior computer programming experience. Nearly half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Students worked in a collaborative inter-disciplinary setting in weekly meetings over the course of a semester to learn text data processing skills. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest (e.g., analysis of Twitter tweets on food shortage). 1) Major activities completed / experiments conducted Objective 1: The PD developed and conducted a semester-long training program during the Spring 2020 semester as planned. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. In each session, the instructor introduced one topic related to computer data analysis using the R programming language and interspersed the presentation with collaborative assignments. The PD and two student assistants would move between groups to answer questions and help them with the assignments. At the end of the semester, each group completed a text data analysis project on a topic of their choice. The student projects completed at the end of the program were: An Analysis of the 2019 Forest Fires Food Shortage Tweet Analysis Mental Health Effects of Lockdown in California Effects of COVID-19 on the Agriculture Industry Physical Fitness Mini-project Analysis of the sentiment toward COVID-19 using Twitter data Sustainability During COVID-19 Crude Oil Crash Sentiment Analysis Sustainability and Agriculture - does modern society have a positive or negative outlook on sustainability pertaining to agriculture? Objective 2: The PD developed a curriculum tailored to students with little prior programming experience to learn basic text processing skills. Text processing is typically considered an "advanced" topic and students outside the computing sciences are unlikely to learn these skills while at the university. The teaching material included PowerPoint slides and a set of guided assignments that are meant to be completed while following the PowerPoint presentation. Objective 3: The PD and the student research/teaching assistants identified datasets that could be used for understanding trends in sustainable behavior by using text processing methods. The datasets included: Collection of Twitter tweets that included keywords related to sustainability. A graduate student analyzed this dataset as part of her MS thesis. The dataset underlying the FoodKeeper application from USDA's Food Safety and Inspection Service about food and beverages storage. 2) Data collected In the first year of the project, the focus was on identifying datasets that could be used as a teaching aid to teach text processing skills. These were: Collection of Twitter tweets that included keywords related to CSU Fullerton. This dataset was used for the group assignments to teach text processing. Collection of Twitter tweets that included keywords related to sustainability. A graduate student analyzed this dataset as part of her MS thesis. The dataset underlying the FoodKeeper application from USDA's Food Safety and Inspection Service about food and beverages storage. The intention is to use this dataset to develop a social media application to promote food wastage reduction practices. 3) Summary statistics and discussion of results 29 undergraduate students were selected to participate in the program. The majors were as follows: Computer Science (7) Mathematics (5) Psychology (4) Biology (3) Computer Engineering (3) Kinesiology (1) English (1) Spanish teacher (1) History (1) Accounting (1) Public Health (1) (Note: the total number of majors do not equal the number of students due to some students having double majors) 15 of the students were male; 14 were female. This compares to the less than 20% of female students in the BS in Computer Science program at CSU Fullerton. Two students were hired as research/teaching assistants. One student was an undergraduate student in Computer Science; the other was a graduate student in Computer Science. Both these students were female. 4) Key outcomes or other accomplishments realized The curriculum and teaching material for a program to teach undergraduate students to analyze text data using a computer programming language were developed. The material is designed for students who have little prior experience with computer programming. 29 undergraduate students from different disciplines learned how to analyze text data using a computer programming language. Datasets were identified that will be used as teaching aids and for student research projects to develop applications to promote wood waste reduction by consumers.

Publications