Progress 09/01/19 to 02/28/23
Outputs Target Audience:The main effort that was completed was the organization of three semester-long training programs for undergraduate students in the use of the R computer programming language for text processing. The programs were conducted in the Spring semesters of years 2020-2022. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: • Engineering and Computer Science • Natural Sciences and Mathematics • Humanities and Social Sciences • Health and Human Development 86 students were selected from these colleges to participate in the program. The majors were as follows: Biology (17) Public Health (16) Computer Science (15) Mathematics (9) Computer Engineering (6) Kinesiology (5) Psychology (4) Biochemistry (2) Chemistry (2) Child and adolescent development (2) Civil Engineering (2) English (2) Nursing (2) Accounting (1) Business (1) Electrical Engineering (1) History (1) Mechanical Engineering (1) Spanish teacher (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 54 of the students were female; 32 were male. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The PD conducted three training programs, each one semester-long, between the years 2020-2022 to teach text analysis using the R computer programming language, primarily to students with little computer programming experience. Unstructured datasets, including text, images, and videos sent over online social networks are increasingly important in understanding consumer behavior and will need to be analyzed for agricultural applications also. Each semester, about 30 students (total of 86 students) were selected based on their interest in careers in sustainability, in particular reducing food waste. The majority of the students were in majors outside the computing sciences and were unlikely to learn text analysis skills while at the university as part of their regular curriculum. The distinctive format of the training program was that students worked in small interdisciplinary groups that included students with both some prior programming experience and no programming experience. The students met with the PD and two student assistants every week to learn text data analysis topics and complete programming exercises. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. The group of students were exceptionally diverse. Students from 19 majors, across 4 colleges in the university, participated in the program. 63% of the students were female; this is in comparison to the approximately 16% of enrolled students who are female in computer science. The PD also worked closely with 8 students over the course of the project (2-3 students per year) on conducting research on automated methods to identify food-related messages on social media with the goal of generating targeted food waste reduction messages. The students were the main co-authors on peer-reviewed publications and poster presentations. One of the students completed her honors thesis on this research topic. All the student research assistants thus developed valuable research, teaching, and technical writing and presentation skills. How have the results been disseminated to communities of interest?Results from research conducted with undergraduate student on automatically analyzing social media messages to identify if a message is about food preparation so that a reply with relevant information from USDA's FoodKeeper database has been published and/or presented at these venues (* indicates students): 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2021. Title: "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media." Authors: E. Lee*, B. Chenze*, and A. Panangadan. [Peer-reviewed publication and conference presentation] 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2022. Title: "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." Authors: B. Chenze*, E. Lee*, and A. Panangadan. [Peer-reviewed publication and conference presentation] CSUF College of Engineering and Computer Science Student Project and Innovation Expo, 2023. Title: "Automatically Recognizing Mentions of Food in Social Media Posts for Targeted Messaging about Food Storage." Authors: A. Shah*, J. Bui*, B. Cortez*, and A. Panangadan. [Poster; winner of Best Computer Science Project award] University Honors Program thesis, California State University, Fullerton. Title: "Targeted Messaging about Food Storage in Social Media Posts." Author: L. Batla* [Honors Thesis] National Conference on Undergraduate Research (NCUR), University of Wisconsin-Eau Claire on April 13-15, 2023. Title: "Targeted Messaging about Food Storage in Social Media Posts." Presenter: B. Cortes* [Oral presentation] All teaching materials developed over the course of the project were available to students who participated in the semester-long training programs in a Canvas (Learning Management System) website. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact statement: The project funded a training program to teach computer programming-based text analysis skills to undergraduate students at CSU Fullerton. Unstructured datasets such as text messages over online social networks are increasingly important in understanding consumer behavior and will need to be analyzed for agricultural applications. Each semester, about 30 students (total of 86 students) were selected based on their interest in careers in sustainability. The majority of the students were in majors outside the computing sciences. The distinctive format of the training program was that students worked in small interdisciplinary groups that included students with both some prior programming experience and no programming experience. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. The students were exceptionally diverse. Students from 19 majors, across 4 colleges in the university, participated in the program. 63% of the students were female; this is in comparison to the approximately 16% of enrolled students who are female in computer science. The teaching material developed over the course of the project can be freely re-used in subsequent years and in other settings outside CSU Fullerton. The PD conducted research with 8 students over the course of the project to develop automated methods to identify food-related messages on social media. These methods will be the core component of a system that can automatically respond to social media messages about food with relevant information about food storage in an effort to reduce end-consumer food waste. The research resulted in two peer-reviewed conference publications, a student presentation at the National Conference of Undergraduate Research, a student's Honors thesis, and a poster at CSUF's annual College of Engineering and Computer Science Student Project and Innovation Expo. The poster won the best Computer Science project award in 2023. Objective 1: 1) Major activities completed / experiments conducted The PD conducted three semester-long training programs, one every Spring semester in years 2020-2022 as planned. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected All students used tweets collected from Twitter for their end-of-semester projects. 3) Summary statistics and discussion of results Over the 3 years, 86 undergraduate students participated in the program. The majors: Biology (17) Public Health (16) Computer Science (15) Mathematics (9) Computer Engineering (6) Kinesiology (5) Psychology (4) Biochemistry (2) Chemistry (2) Child and adolescent development (2) Civil Engineering (2) English (2) Nursing (2) Accounting (1) Business (1) Electrical Engineering (1) History (1) Mechanical Engineering (1) Spanish teacher (1) (Note: the total number of majors do not equal the number of students due to students having double majors.) 54 of the students were female. This compares to the approximately 16% of female students in the BS in Computer Science program at CSU Fullerton. 4) Key outcomes or other accomplishments realized 86 undergraduate students from different disciplines learned how to analyze text data using computer programming. Objective 2: 1) Major activities completed / experiments conducted The PD developed teaching material that is intended to teach students with little to no prior computer programming experience. This includes: PowerPoint slides, a set of programming assignments, solutions to the programming assignments, and sample datasets. 2) Data collected We identified publicly-available datasets and also collected our own data to be used as a teaching aid. These were: Collection of Twitter tweets that include keywords related to CSU Fullerton. Collection of Twitter tweets that include keywords related to sustainability. Food Loss dataset from the Food and Agriculture Organization (FAO). National Electronic Injury Surveillance System (NEISS) dataset. 3) Summary statistics and discussion of results The teaching material takes a student from the very basics of programming in the R language to the more specialized topic of analyzing text datasets. 4) Key outcomes or other accomplishments realized The project developed teaching material to introduce text data analysis that can be freely re-used in subsequent years and in other settings outside CSUF. This outcome supports Objective 2 of the HSI Education Grants Program since the teaching material that was produced can be used to improve the data analysis instruction provided as part of an FANH education. Objective 3: 1) Major activities completed / experiments conducted We researched methods to identify food-related messages on online social media and to extract the specific food-related words in each such message. These methods can be used to build an application to respond automatically to social media messages (a "chatbot") about food with relevant information about food safety and preparation in an effort to reduce food waste by end-consumers. Our method uses an open-source Natural Language Processing (NLP) library called spaCy and keywords from the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service. This approach was demonstrated for messages on Twitter and image posts on Instagram by using image object-detection algorithms. 2) Data collected The following datasets were used in this research activity. FoodKeeper application from USDA's Food Safety and Inspection Service. 39,000 food-related tweets from a publicly available collection of 1.6 million tweets. A publicly available dataset of Instagram influencers containing over 10 million posts, including image (https://sites.google.com/site/sbkimcv/dataset/instagram-influencer-dataset). Our research used only a subset of the posts that were labeled as food-related. 3) Summary statistics and discussion of results We evaluated the precision and recall of the entity recognition method on a hand-labeled subset of tweets. Our first method achieved a test-set accuracy of 73% with a precision of 0.96, and a recall of 0.52 (f1-score of 0.68). Our second approach, which used an iterative "snowball" algorithm, achieved a precision of 0.80, and a recall of 0.80 (f1-score of 0.80). Thus, this approach resulted in a significant increase of 0.28 in recall performance. We did limited testing on Instagram image posts. We used Microsoft's Computer Vision cloud-based API to recognize objects in images and return a text description of an image. Out of the 25 images in a randomly sampled set of Instagram posts of images, 14 had responses that were a true positive. 4) Key outcomes or other accomplishments realized We built the core component of a system that can automatically respond to messages about food with relevant information about food storage in an effort to reduce end-consumer food waste. We have made progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with reasonably high accuracy. Over the course of this project, this research activity included 8 students who were hired as research assistants. The research resulted in two peer-reviewed conference publications, a student presentation at the National Conference of Undergraduate Research, a student's Honors thesis, and a poster at CSUF's annual College of Engineering and Computer Science Student Project and Innovation Expo. The poster won the best Computer Science project award in 2023.
Publications
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2022
Citation:
L. Batla, Targeted Messaging about Food Storage in Social Media Posts, University Honors Program thesis, California State University, Fullerton. 2022
- Type:
Other
Status:
Other
Year Published:
2023
Citation:
A. Shah, J. Bui, B. Cortez, A. Panangadan, Automatically Recognizing Mentions of Food in Social Media Posts for Targeted Messaging about Food Storage, Poster at 2023 California State University, Fullerton College of Engineering and Computer Science Student Project and Innovation Expo.
|
Progress 09/01/21 to 08/31/22
Outputs Target Audience:The main effort that was completed this year was the organization of the third semester-long training program for undergraduate students in the use of the R computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 25 students were selected from these colleges to participate in the program. The majors were as follows: Public Health (7) Biology (5) Computer Science (4) Child and Adolescent Development (2) Kinesiology (2) Mathematics (2) Business (1) Chemistry (1) Computer Engineering (1) Electrical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The PD conducted the third installment of the semester-long training program to teach text analysis using the R computer programming language in Spring 2022. 25 undergraduate students participated in the program. The majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. In addition, two undergraduate students were hired as research/teaching assistants. These students assisted training program participants to complete their assignments and later participated in research to develop food entity recognition methods. How have the results been disseminated to communities of interest?Our research on the iterative method to identify food entities in social media messages has been published after peer-review at the 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI). The work was also orally presented at the conference: 23rd IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 2022. Title: "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." Authors: B. Chenze, E. Lee, and A. Panangadan. The work was also presented as a poster at the annual CSU Fullerton College of ECS Student Projects Showcase. What do you plan to do during the next reporting period to accomplish the goals?In the last 6 months of the project, the following tasks will be performed. Continue research into methods for food entity recognition; evaluate the performance of this approach on the Instagram image-based posts in addition to text messages on Twitter. Make teaching material more visually attractive. Disseminate findings in venues inside and outside CSU Fullerton.
Impacts What was accomplished under these goals?
Impact statement: In the third year of this project, the PD repeated the semester-long training program to teach text analysis using a computer programming language. 25 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. As in previous years, students were selected primarily based on their interest in working on sustainability areas, rather than their prior experience with computer programming. As a result, the majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest. The PD conducted research with students to develop a method that can automatically identify when a social media message is about food. The goal of this research is to enable replies, containing tips on reducing food spoilage (from the FoodKeeper dataset provided by the USDA Food Safety and Inspection Service), to be automatically generated. The team improved upon the food entity recognition method that was developed in the previous year and results have been published in a peer-reviewed conference. Objective 1: 1) Major activities completed / experiments conducted The PD conducted the semester-long training program during the Spring 2022 semester as planned. This was the third time the program was offered to students. The program prioritized recruiting students based on their interest in sustainability and careers in this field rather than on their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. The sessions were held in-person after the first two weeks but students had the option to join over Zoom if they had concerns because of the pandemic. In each session, the instructor introduced a few topics related to computer data analysis using the R programming language. Each concept was taught in a "hands-on" manner - students completed programming assignments in small groups (in Zoom breakout rooms or with fellow students in the classroom). The PD and student assistants would move between groups/breakout rooms to answer questions and help students with the assignments. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected Most students downloaded tweets containing keywords related to sustainability (composting, electric vehicles), and the ongoing pandemic. 3) Summary statistics and discussion of results 25 undergraduate students were selected to participate in the program. The majors were as follows: Public Health (7) Biology (5) Computer Science (4) Child and Adolescent Development (2) Kinesiology (2) Mathematics (2) Business (1) Chemistry (1) Computer Engineering (1) Electrical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 4) Key outcomes or other accomplishments realized 25 undergraduate students from different disciplines learned how to analyze text data using computer programming. Two undergraduate students worked as research/teaching assistants. Research conducted with students on text processing resulted in a peer-reviewed publication and a presentation at the annual College of Engineering and Computer Science Projects Showcase. Objective 2: 1) Major activities completed / experiments conducted The PD continued to update the teaching material - a set of PowerPoint slides and a sequence of short computer programming assignments that are meant to be completed by a student while following the PowerPoint presentation. We made the following modifications to the teaching material: 1) added the Food Loss dataset from the Food and Agriculture Organization (FAO), 2) a new set of computer programming assignments that make use of the FAO food waste dataset, and 3) making the PowerPoint slides more visually attractive. 2) Data collected We identified a new publicly-available dataset, Food Loss dataset from the Food and Agriculture Organization (FAO), to be used as a teaching aid. 3) Summary statistics and discussion of results In addition to the previously collected datasets, the new FAO dataset enables students to work on programming assignments using data that is directly relevant to their interest in reducing food waste. 4) Key outcomes or other accomplishments realized The updated teaching material can be used to teach students, starting from the very basics of programming in the R language to the more specialized topic of analyzing text datasets. Objective 3: 1) Major activities completed / experiments conducted We continued development of methods to automatically identify social media messages about food. Our method will enable "food entity recognition", the computational task of identifying words or phrases in English text that correspond to specific foods. In the third year of the project, the PD and two undergraduate student research assistants extended the Snowball approach to enable recognition of keywords describing foods in unstructured text that is typical in social media. This approach uses a set of food-related keywords from the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service. The iterative approach starts with dataset messages that are most likely to be about some food. Likelihood is based on the number of keywords that appear in a message. We tested this approach by identifying food entities in messages on the Twitter network. We evaluated the accuracy of this method and published quantitative results in a peer-reviewed conference. In the third year of the project, we also looked into extending this method to image-based messages posted to the Instagram social media network, to see if it is possible to reliably identify food-related posts. 2) Data collected The following datasets were used in this research activity. FoodKeeper application from USDA's Food Safety and Inspection Service. 39,000 food-related tweets from a publicly available collection of 1.6 million tweets. A publicly available dataset of Instagram influencers containing over 10 million posts, including images (https://sites.google.com/site/sbkimcv/dataset/instagram-influencer-dataset). 3) Summary statistics and discussion of results We evaluated the precision and recall of the "Snowball" food entity recognition method on a hand-labeled subset of 80 tweets. The approach achieved a precision of 0.80, and a recall of 0.80 (f1-score of 0.80). This represents a significant increase of 0.28 in recall performance compared to our previous approach. We have also begun to quantify the accuracy of detection of food images in Instagram posts. 4) Key outcomes or other accomplishments realized We continued to build the components of a system that can ultimately be used to determine whether tweets are food-related. Successful development of such a system will enable relevant responses, containing tips for reducing food waste, to be automatically sent to the poster of the message. We continued to make progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy.
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2022
Citation:
B. Chenze, E. Lee, and A. Panangadan. "Iterative Approach for Novel Entity Recognition of Foods in Social Media Messages." In 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI), pp. 126-131. IEEE, 2022.
|
Progress 09/01/20 to 08/31/21
Outputs Target Audience:The main effort that was completed this year was the organization of a semester-long training program for undergraduate students in the use of the R computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two student assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 31 students were selected from these colleges to participate in the program. The majors were as follows: Biology (8) Public Health (7) Computer Science (4) Biochemistry (2) Civil Engineering (2) Computer Engineering (2) Kinesiology (2) Mathematics (2) Chemistry (1) English (1) Mechanical Engineering (1) Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 20 of the students were female; 11 were male. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The PD conducted a semester-long training program to teach text analysis using the R computer programming language. 31 undergraduate students from diverse disciplines participated in the program. The majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Undergraduate students outside the computing sciences are unlikely to learn text analysis skills while at the university. However, online social networks contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest that was related to sustainability. How have the results been disseminated to communities of interest?Our research on a method to respond automatically to social media messages about food with relevant information about food safety and preparation has been published after peer-review at the 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI). The work was also orally presented at the conference. E. Lee, B. Chenze and A. Panangadan, "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media," 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 263-266, doi: 10.1109/IRI51335.2021.00042. What do you plan to do during the next reporting period to accomplish the goals?In Year 3 of the project, the following tasks will be performed. Conduct a third installment of the semester-long text processing training program (in Spring 2022 semester) Continue research into food entity recognition Develop a proof-of-concept system to demonstrate how replies can be automatically generated with tips on reducing food spoilage for use on the Twitter social media platform Disseminate findings in conferences and symposia on food waste
Impacts What was accomplished under these goals?
Impact statement: In the second year of this project, the PD conducted a semester-long training program to teach text analysis using a computer programming language. 31 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. A majority of the students were from majors outside engineering and computer science and most of them had little prior computer programming experience. More than half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Text processing is considered an "advanced" data analysis topic and students outside the computing sciences are unlikely to learn these skills while at the university. However, online social networks contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest. The PD and student researchers are also building the components of a system that can ultimately be used to determine whether social media messages are food-related in real-time. When a message is recognized to be related to food, a reply will be automatically generated with tips on reducing food spoilage using the FoodKeeper dataset (a dataset provided by the USDA Food Safety and Inspection Service). The team has made significant progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy. Objective 1: 1) Major activities completed / experiments conducted The PD conducted the semester-long training program during the Spring 2021 semester as planned. This was the second time the program was offered to students. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. All sessions were online over Zoom because of the pandemic, In each session, the instructor introduced one topic related to computer data analysis using the R programming language and interspersed the presentation with programming assignments conducted in small groups in Zoom breakout rooms. The PD and student assistants would move between breakout rooms to answer questions and help them with the assignments. At the end of the semester, student groups completed a text data analysis project on a topic of their choice. 2) Data collected All students used tweets collected from Twitter for the end-of-semester projects. Most students downloaded tweets containing keywords related to sustainability (composting, electric vehicles), and the ongoing pandemic. 3) Summary statistics and discussion of results 31 undergraduate students were selected to participate in the program. The majors were as follows: 1. Biology (8) 2. Public Health (7) 3. Computer Science (4) 4. Biochemistry (2) 5. Civil Engineering (2) 6. Computer Engineering (2) 7. Kinesiology (2) 8. Mathematics (2) 9. Chemistry (1) 10. English (1) 11. Mechanical Engineering (1) 12. Nursing (1) (Note: the total number of majors do not equal the number of students due to two students having double majors.) 20 of the students were female; 11 were male. This compares to the less than 20% of female students in the BS in Computer Science program at CSU Fullerton. 4) Key outcomes or other accomplishments realized 31 undergraduate students from different disciplines learned how to analyze text data using a computer programming. Two students were hired as research/teaching assistants. One student was an undergraduate student in Computer Science; the other was a graduate student in Computer Science. Research conducted with these two students on text processing resulted in a peer-reviewed publication. Objective 2: 1) Major activities completed / experiments conducted The PD continued to update the teaching material. These include PowerPoint slides and a set of programming assignments that are meant to be completed by a student while following the PowerPoint presentation. The programming assignments use the built-in datasets in R and also use text data that is more current and relevant to a student's daily life, such as tweets about the university. 2) Data collected We added to the collection of datasets that could be used for the short programming assignments, student projects, and for research. The additional datasets were: Collection of Twitter tweets that included keywords related to a topic related to sustainability that was of interest to a student group, e.g., electric vehicles. A publicly available collection of 1.6 million tweets from 2009. The dataset has a large variety of tweets, and specifically has both food related and not food related tweets. 3) Summary statistics and discussion of results The large number and variety of tweets in the publicly-available tweets dataset enabled us to use this as both training and validation sets for entity recognition. To identify food-related words in a tweet, we made use of the keywords in USDA's Food Safety and Inspection Service FoodKeeper dataset (collected in the previous year). 4) Key outcomes or other accomplishments realized We now have a wider variety of teaching material to teach undergraduate students to analyze text data using the R computer programming language. The material includes more data specific to food waste applications. Objective 3: 1) Major activities completed / experiments conducted We are doing research to develop a method to respond automatically to social media messages about food with relevant information about food safety and preparation in an effort to reduce food waste by end-consumers. The main challenge is "entity recognition", the computational task of identifying words or phrases in natural language text that correspond to real-world objects such as foods. Current entity recognition methods can recognize only a relatively small set of entity types. We are developing a method to extend existing entity recognition systems to recognize foods without requiring a large labeled dataset by using the FoodKeeper dataset, a dataset provided by USDA's Food Safety and Inspection Service, and which contains information on a variety of foods. Our work is designed to enable recognition of novel entity types such as food from unstructured text that is typical in social media. 2) Data collected We used USDA's Food Safety and Inspection Service FoodKeeper dataset and the 1.6 million tweet dataset that was described in Objective 2. 3) Summary statistics and discussion of results We evaluated the precision and recall of the entity recognition method on a hand-labeled subset of tweets. The system achieved a precision of 0.80 and a recall of 0.80 (f-score of 0.80) on this dataset. Previous studies have reported a model that achieved a precision of 0.96, a recall of 0.52 and f1-score of 0.68. Our approach was able to achieve a 0.28 increase in recall performance. More than just the increased recall and f1-score, the model uses the context of food entities in tweets better than a baseline model. 4) Key outcomes or other accomplishments realized We are building the components of a system that can ultimately be used to determine whether tweets are food-related. When a tweet is recognized to be related to food, then an insightful response can then be posted informing the user of different ways of maximizing the longevity of that food. We have made progress in the most challenging part of such a system, which is to recognize food-related tweets and specific foods referenced in them with high accuracy.
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2021
Citation:
E. Lee, B. Chenze and A. Panangadan, "Encouraging Sustainability Practices through Entity Recognition of Food Items on Social Media," 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 263-266, doi: 10.1109/IRI51335.2021.00042.
|
Progress 09/01/19 to 08/31/20
Outputs Target Audience:The main effort that was completed this year was the organization of a semester-long training program for undergraduate students in the use of a computer programming language for text processing. Students met once a week for 14 weeks, for one hour per week, with the PD and two teaching assistants. The target audience was juniors and seniors in the following colleges in CSU Fullerton: Engineering and Computer Science Natural Sciences and Mathematics Humanities and Social Sciences Health and Human Development 29 students were selected from these colleges to participate in the program. The majors were as follows: Computer Science (7) Mathematics (5) Psychology (4) Biology (3) Computer Engineering (3) Kinesiology (1) English (1) Spanish teacher (1) History (1) Accounting (1) Public Health (1) (Note: the total number of majors do not equal the number of students due to some students having double majors) 15 of the students were male; 14 were female. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? 29 undergraduate students from different disciplines learned how to analyze text data using a computer programming language. Two students learned of USDA activities and opportunities (e.g., internships) by attending the American Association of Hispanics in Higher Education (AAHHE) conference in Costa Mesa, California. Two students developed teaching skills by working as teaching assistants for the program One graduate student completed her MS thesis under the supervision of the PD One undergraduate student completed an Independent Study under the supervision of the PD. This student then went on to join a graduate program. How have the results been disseminated to communities of interest?A flyer was prepared advertising the benefits of participating in the text analysis training program. The flyer was vetted by Sergio Guerra, Director of the Center for Academic Support in Engineering & Computer Science (CASECS), CSU Fullerton. The flyer was distributed via the university's advisors email distribution system to all juniors and seniors in the colleges of Engineering and Computer Science, Natural Sciences and Mathematics, Humanities and Social Sciences, and Health and Human Development. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Impact statement: By the end of the first year of this project, the PD developed and conducted a semester-long training program to teach text analysis using a computer programming language. 29 undergraduate students from diverse disciplines participated in the program, learned how to analyze text data, and were able to complete text analysis projects on a topic of their choice. Text processing is considered an "advanced" data analysis topic and students outside the computing sciences are unlikely to learn these skills while at the university. However, emerging datasets from a variety of sources, including online social networks, contain unstructured text data and such datasets will need to be analyzed for agricultural applications, for instance, to understand consumer behavior. This program thus provided students with an interest in careers in sustainability the skills to analyze this type of data. The students were selected based on their written interest in sustainable behaviors and careers in sustainability. Nearly two-thirds were from majors outside engineering and computer science and most of these students had little prior computer programming experience. Nearly half of the students were female; this is in comparison to the less than 20% of enrolled students who are female in computer science. Students worked in a collaborative inter-disciplinary setting in weekly meetings over the course of a semester to learn text data processing skills. At the end of the training program, students were able to quantify trends in social media messages on a topic of their interest (e.g., analysis of Twitter tweets on food shortage). 1) Major activities completed / experiments conducted Objective 1: The PD developed and conducted a semester-long training program during the Spring 2020 semester as planned. The program recruited students based on their interest in sustainability and careers in this field instead of their prior programming experience. Students met once a week for one hour with the PD, who acted as the instructor, and two student teaching assistants. In each session, the instructor introduced one topic related to computer data analysis using the R programming language and interspersed the presentation with collaborative assignments. The PD and two student assistants would move between groups to answer questions and help them with the assignments. At the end of the semester, each group completed a text data analysis project on a topic of their choice. The student projects completed at the end of the program were: An Analysis of the 2019 Forest Fires Food Shortage Tweet Analysis Mental Health Effects of Lockdown in California Effects of COVID-19 on the Agriculture Industry Physical Fitness Mini-project Analysis of the sentiment toward COVID-19 using Twitter data Sustainability During COVID-19 Crude Oil Crash Sentiment Analysis Sustainability and Agriculture - does modern society have a positive or negative outlook on sustainability pertaining to agriculture? Objective 2: The PD developed a curriculum tailored to students with little prior programming experience to learn basic text processing skills. Text processing is typically considered an "advanced" topic and students outside the computing sciences are unlikely to learn these skills while at the university. The teaching material included PowerPoint slides and a set of guided assignments that are meant to be completed while following the PowerPoint presentation. Objective 3: The PD and the student research/teaching assistants identified datasets that could be used for understanding trends in sustainable behavior by using text processing methods. The datasets included: Collection of Twitter tweets that included keywords related to sustainability. A graduate student analyzed this dataset as part of her MS thesis. The dataset underlying the FoodKeeper application from USDA's Food Safety and Inspection Service about food and beverages storage. 2) Data collected In the first year of the project, the focus was on identifying datasets that could be used as a teaching aid to teach text processing skills. These were: Collection of Twitter tweets that included keywords related to CSU Fullerton. This dataset was used for the group assignments to teach text processing. Collection of Twitter tweets that included keywords related to sustainability. A graduate student analyzed this dataset as part of her MS thesis. The dataset underlying the FoodKeeper application from USDA's Food Safety and Inspection Service about food and beverages storage. The intention is to use this dataset to develop a social media application to promote food wastage reduction practices. 3) Summary statistics and discussion of results 29 undergraduate students were selected to participate in the program. The majors were as follows: Computer Science (7) Mathematics (5) Psychology (4) Biology (3) Computer Engineering (3) Kinesiology (1) English (1) Spanish teacher (1) History (1) Accounting (1) Public Health (1) (Note: the total number of majors do not equal the number of students due to some students having double majors) 15 of the students were male; 14 were female. This compares to the less than 20% of female students in the BS in Computer Science program at CSU Fullerton. Two students were hired as research/teaching assistants. One student was an undergraduate student in Computer Science; the other was a graduate student in Computer Science. Both these students were female. 4) Key outcomes or other accomplishments realized The curriculum and teaching material for a program to teach undergraduate students to analyze text data using a computer programming language were developed. The material is designed for students who have little prior experience with computer programming. 29 undergraduate students from different disciplines learned how to analyze text data using a computer programming language. Datasets were identified that will be used as teaching aids and for student research projects to develop applications to promote wood waste reduction by consumers.
Publications
|
|