Progress 10/24/14 to 09/30/16
Outputs Target Audience:Communication social scientists specializing in text-analytic methods Undergraduate students interested in doing research on social media and food related topics Members of the public interested in understanding attitudes about local food in their communities. Changes/Problems:Obtaining the full Yelp data for all businesses in New York State took substantially longer than anticipated. As described in the previous section, the data were both more poorly organized -- with overlapping town features -- and more restrictively provided -- based on Yelp's limits for web scraping, than initially anticipated. As a result, developing an appropriate script took longer than anticipated and required more sophisticated programming skills than initially thought. The script was not complete and tested until fall 2016. The scraping process itself also took several weeks. What opportunities for training and professional development has the project provided?During the project we have engaged: 1) Two masters students in information sciences to obtain the Yelp data. I worked with each of them to improve their understanding of how the data they gathered feed into social scientific inquiry. One student worked with me to develop the basis for measuring whether a reviewer was "local" to the business they were reviewing. 2) One doctoral student specializing in linguistic analysis/text analysis. The student worked in the summer 2015. He had previous experience working with text/linguistic analysis but did not have prior experience working with social media data or the R programming language. He developed skills in both of these areas during the project. 3) One doctoral student specializing in intergroup communication. The student worked in teh summer 2016. He had relevant theoretical experience. On the project he learned R programming as well as web scraping for meta-data (such as community zip codes). 4) One undergraduate student undertook the data visualization project. As part of the project he learned D3.js and produced an interactive web visualization of our data. 5) Two undergraduate students, one in communication and one in computer science, became interested in the project and have been conducting their own investigation of how people talk about food on social media looking at posts on Instagram about Ithaca food establishments, including Cornell dining halls. Their work was conducted as an independent study and did not require funds, however, their analyses may be used as part of the project depending on what they find. During the fall semester (they began work in November, 2015), they learned skills related to the collection and social scientific analysis of social media data. How have the results been disseminated to communities of interest?Scholarly communities: Results of our analyses of local vs. visiting reviewers were presented at the International Communication Association (ICA) conference in Fukuoaka, Japan, in June, 2016. Follow-up to this work is under-review at Communication Research, a leading journal in the field of communication. Results of our analysis of expectation and socio-economic status have been submitted to this year's ICA conference. Public: We have created a data visualization of Yelp review text by town/city in New York State. It is available on PI Margolin's website. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Goal #1 An assumption of our approach is that the audience for Yelp reviews has an influence on the reviewer in terms of the text that they choose to right. It is through this influence that we hope to infer community properties from the way that people write about objects (e.g. restaurants) in a community. Our research does uncover such a relationship. Consistent with our theoretical expectation, reviewers modify their language choices based on the extremity of their review. Specifically, more extremely positive and negative reviews are presented in rhetorically more defensible language, using more abstractions and first person constructions. Importantly, however, we find that these differences depend on who is writing the review. Inspection of Yelp data indicates, that a substantial number of reviews are written by individuals from "out of town." These reviews still reflect a normative sense of that community -- what is its identity and how does it relate to appropriate behaviors within and toward it -- but from a different perspective. We find that local reviewers tend to write reviews in a more dynamic, storytelling style than "visiting" reviewers. The signature of this style in text is the use of fewer definite articles and more abstract phrases. This work was submitted to Communication Research (A Multi-Theoretical Approach to Big Text Data: Comparing Expressive and Rhetorical Logics in Yelp Reviews) Goal#2 We ran several analyses comparing CSA density -- # of CSAs with 10 and 25 miles of a community where a restaurant is located -- to Yelp review data for that restaurant parsed for simple discourse metrics. These analyses did not reveal and strong correlations. We suspect this relates to our findings from Goal #1, having to do with the heterogeneity of users participating. In particular, our findings indicate that local reviewers are likely to respond differently than visiting reviewers. In addition, we observed some evidence that socio-economic status of the community may play a role. We have thus expanded our analysis of this factor (see #4 below). Overall, the Yelp discourse data is more heterogeneous and complex than we anticipated. This requires us to understand its properties more fully before being able to conclusively test ideas about their relationship to CSA density. Goal #3 While we have not yet discovered a meaningful index of receptivity, we have produced a data visualization of simple review metrics by community in New York State. The visualization is available to the public on the website here. http://margolin.cac.cornell.edu/Yelp_Projection_Porter/Yelp_Projection.html Goal #4 One feature that was important in our CSA analysis was the socioeconomic status of the community. Three competing social psychological theories predict that the group status (in this case socioeconomic) of an object (in this case a restaurant): Expectancy-violation theory, Black sheep effect, Extremity-complexity model. Each of these models relates a reviewer's expectation of an experience to their actual experience. Currently we are investigating which of these theories is dominant in the context of Yelp restaurant reviews and in which context. The goal of this research is to better understand how written discourse reflects the expectations that a reviewer had for a restaurant before experiencing it. Inferring these expectations is key to inferring the normative identity of the community. For example, when reviewers expect food to be "locally sourced," they will be disappointed when it is not. By contrast, when they have no such expectation, they may be pleasantly surprised when it is. These expectations influence the reviewer's experience and how they describe the review, but they also reflect a social understanding of that restaurant in its community. That is, in locations where no one expects the food to be local, it is unlikely that local food will be strongly valued (even if it is lauded when provided). Rather, local food should be prized in cases where reviewers write as though localness is expected, even if it is not delivered by a particular experience. Our preliminary findings have been submitted to the International Communication Association Conference: A Taste of Discrimination (2017). Submitted to International Communication Association Annual Conference (San Diego, California 25-29 May 2017)
Publications
- Type:
Journal Articles
Status:
Under Review
Year Published:
2017
Citation:
"A Multi-Theoretical Approach to Big Text Data: Comparing Expressive and Rhetorical Logics in Yelp Reviews (revision requested for resubmission). Communication Research.
- Type:
Conference Papers and Presentations
Status:
Submitted
Year Published:
2017
Citation:
A Taste of Discrimination (2017). Submitted to International Communication Association Annual Conference (San Diego, California 25-29 May 2017)
- Type:
Conference Papers and Presentations
Status:
Submitted
Year Published:
2017
Citation:
Tweeting Climate Change: Who or What Motivates Politicians to Address The Topic? (2017). Submitted to International Communication Association Annual Conference (San Diego, California 25-29 May 2017)
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2016
Citation:
Margolin, D., & Markowitz, D. (2016). You Write What You Eat: Linguistic Style, Ratings, and Locale of Yelp Reviews. Presented to International Communication Association Annual Conference (Fukuoka Japan
9-13 June 2016).
|
Progress 10/24/14 to 09/30/15
Outputs Target Audience:Target Audiences Communication social scientists specializing in text-analytic methods Undergraduate students interested in doing research on social media and food related topics Efforts -- Our goal is to identity meaningful indicators of cohesive local communities that value local products using social media data. Since social media data is primarily text-based, our first effort was to develop a means of detecting local community orientation with text-analytic methods. Thus our first year efforts were focused on getting feedback on our work from social scientists who specialize in these techniques. To address this audience we have prepared and submitted a manuscript "You Write What You Eat: Linguistic Style, Ratings, and Locale of Yelp Reviews" to the International Communication Association annual conference. We also engaged two undergraduate students who are interested in using social media data to better understand the relationship between social media communication and attitudes about food and food providing establishments, including restaurants and dining halls. They work with the PI as part of an independent study gathering data about Ithaca area restaurants and markets on Instagram. They are receiving training in scraping and analyzing social media data as part of this project. Changes/Problems:Obtaining the full Yelp data for all businesses in New York State took substantially longer than anticipated. As described in the previous section, the data were both more poorly organized -- with overlapping town features -- and more restrictively provided -- based on Yelp's limits for web scraping, than initially anticipated. As a result, developing an appropriate script took longer than anticipated and required more sophisticated programming skills than initially thought. The script was not complete and tested until fall 2016. The scraping process itself also took several weeks. The data have now been obtained, however. What opportunities for training and professional development has the project provided?During the project we have engaged: 1) Two masters students in information sciences to obtain the Yelp data. I have worked with each of them to improve their understanding of how the data they gathered feed into social scientific inquiry. One student worked with me to develop the basis for measuring whether a reviewer was "local" to the business they were reviewing. 2) One doctoral student specializing in linguistic analysis/text analysis. The student worked in the summer 2015. He had previous experience working with text/linguistic analysis but did not have prior experience working with social media data or the R programming language. He developed skills in both of these areas during the project. 3) Two undergraduate students, one in communication and one in computer science, became interested in the project and have been conducting their own investigation of how people talk about food on social media looking at posts on Instagram about Ithaca food establishments, including Cornell dining halls. Their work was conducted as an independent study and did not require funds, however, their analyses may be used as part of the project depending on what they find. During the fall semester (they began work in November, 2015), they learned skills related to the collection and social scientific analysis of social media data. How have the results been disseminated to communities of interest?Results of the analyses of self-presentation and CLT have been disseminated to the communication scholarly field by submission of a paper to the International Communication Association annual conference. What do you plan to do during the next reporting period to accomplish the goals?The next phase of the project entails three main steps: Applying the text-analytic techniques developed on the academic dataset to the (now complete) New York State dataset Comparing the text-analytic features of community to the population of CSAs in different local areas in New York State. Creation of the receptivity index.
Impacts What was accomplished under these goals?
Related to Goal #1 and Goal #4 Our first analysis was to look at variations in food-related dialogue across different communities. To do this we began with the Yelp academic dataset, which, contains reviews of businesses, primarily restaurants, across 26 communities around the country. Our goal was to identify the textual signature of allegiance to local establishments, as this should signal a preference for local food. We found significant effects in line with both self-presentation theory and construal level theory. According to self-presentation theory (Bazarova et al., 2012), people are more concerned with how they will be judged by audiences who are socially close to them. One of the important signatures of self-presentation is the injection of positive emotional words or "pro-social" words into communication. We analyzed the use of positive emotion words using the Linguistic Inquiry Word Count (LIWC) dictionary. We first control for review rating -- the score that reviewers give to a restaurant or other establishment -- as positive emotion is strongly correlated with experience. This allows us to see whether certain reviews contain more positive emotion than would be expected just based on their rating alone. These reviews are indicative of self-presentation processes -- that is, that the reviewers are adding "extra" positive emotion in order to appear a certain way to their audience. We then compare the behavior of local reviewers -- reviewers whose review history indicates they live in the same community as the business being reviewed -- to visiting reviewers in terms of this self-presentation, that is, the use of this "extra" positive emotion. The effect was significant. In each of the 26 communities, local reviewers tended to add more "extra" positive emotion to their reviews than visiting reviewers. This thus serves as our first linguistic feature indicator of the influence of "localness." We also found the influence of localness through Construal Level Theory (CLT) (Trope & Liberman, 2010). According to CLT, individuals will use more concrete, immediate language when talking about objects that are socially close to them. We find support for this in that reviewers tend to use more "present tense" verbs when talking about local businesses than when talking about businesses that are not in their primary community. This serves as our second linguistic feature indicator. Importantly, for each of these indicators we find significant variation across communities. That is, self-presentation and construal operate more strongly in some communities than in others. For example, self-presentation (the injection of extra positive emotion) is significantly weaker in Albany than in Ithaca. Reviewers local to Albany inject an extra .15% positive emotion words into their reviews of Albany businesses over non-Albany businesses, while Ithaca reviewers inject an extra .37 positive emotion words. This suggests that Ithacans are more concerned with how they will be judged by other Ithacans as it relates to their opinions. These findings set the stage for our next set of tests where we will compare these community-level variations in self-presentation and CLT to observable CSA data. We have also collected the full, publicly available set of Yelp reviews written about businesses in New York State. This was a challenge because Yelp does not serve data directly by state. A custom web-scraping script was written to query Yelp for the reviews written for businesses in each town with New York State. The script then had to check for duplications -- some businesses are listed under adjacent towns or hierarchies of areas (e.g. New York City, Bronx). The script also had to avoid overwhelming Yelp's servers by waiting/resting between queries. The development of a feasible script and the completion of the scraping took substantial effort. Collection of the complete New York State Yelp dataset was completed in November, 2015. Bazarova, N. N., Taft, J. G., Choi, Y. H., & Cosley, D. (2012). Managing impressions and relationships on Facebook: Self-presentational and relational concerns revealed through the analysis of language style. Journal of Language and Social Psychology, 0261927X12456384. Trope, Y., & Liberman, N. (2010). Construal-level theory of psychological distance. Psychological Review, 117(2), 440-463. http://doi.org/10.1037/a0018963
Publications
- Type:
Conference Papers and Presentations
Status:
Under Review
Year Published:
2016
Citation:
Margolin, D., & Markowitz, D. (2016). You Write What You Eat: Linguistic Style, Ratings, and Locale of Yelp Reviews. Submitted to International Communication Association Annual Conference (Fukuoka Japan
9-13 June 2016).
|