Source: CLEMSON UNIVERSITY submitted to NRP
DATA SOURCES AND FOOD DEMAND ESTIMATION: A COMPARISON OF HOMESCAN AND CONSUMER EXPENDITURE SURVEY DATA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0224570
Grant No.
2011-67023-30058
Cumulative Award Amt.
(N/A)
Proposal No.
2010-04811
Multistate No.
(N/A)
Project Start Date
Feb 15, 2011
Project End Date
Feb 14, 2014
Grant Year
2011
Program Code
[A1611]- Foundational Program: Economics of Markets and Development
Recipient Organization
CLEMSON UNIVERSITY
(N/A)
CLEMSON,SC 29634
Performing Department
School of Agricultural, Forest, & Environmental Sciences
Non Technical Summary
As argued by the National Research Council (2005), in the face of increasing health problems such as obesity in the United States, there is an increasing need to understand the economic and social factors behind food consumption and nutrition. Research in this area has been restricted to some degree by the difficulty in accessing the necessary data sources and/or by the fact that some of the data sources do not contain all the variables believed to be important for the analyses. The main objective of this study is to evaluate the potential of combining publicly available datasets and state of the art econometric methods instead of the proprietary ACNielsen Homescan data. Secondary objectives include the cross validation of the results of food demand analysis using alternative data sources and evaluate procedures to ameliorate biases induced by measurement error. The publicly available sources of data include the Bureau of Labor Statistics Consumer Price Index and the Consumer Expenditure Survey. The procedure considered for estimation of demand models is the method proposed in a Journal of Econometrics article by Hoderlein and Mihaleva (2008).
Animal Health Component
40%
Research Effort Categories
Basic
60%
Applied
40%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
6097310209050%
6075010301050%
Goals / Objectives
The main objective of this study is to evaluate the potential of using publicly available datasets and state of the art econometric methods in lieu of the privately owned Homescan data. A secondary objective is to cross validate the results of food demand analysis (elasticity values and welfare measures) using alternative data sources and evaluate procedures to ameliorate measurement error induced biases. In other words, the project will try to answer three interrelated questions: 1) Are there any differences between demand model estimates obtained using Homescan data and BLS data (CEX and CPI data) If that is the case, 2) What are the sources of the differences and 3) Are there procedures currently available that can help to eliminate/reduce measurement error induced biases The expected outcomes from this project are the following: 1) A validated set of procedures that could potentially allow researchers to use publicly available data for estimating price and income elasticities and performing welfare analysis instead of having to rely in proprietary data sources. 2) A set of updated price and income elasticities for several food products. Elasticities calculated using different data sets can be used for different purposes. For example, the evaluation of the robustness of results with respect to elasticity values uncertainty is now commonly included in policy studies (e.g., Carpio and Isengildina-Massa, 2010; Wolhgenant, 2010). This type of analysis requires information on the probability distribution of the elasticities which can only be inferred if the values are available from several sources. 3) Information on the differences and/or similarities of demand response analysis using alternative data sources. These estimates would provide confidence (or doubt) in the use of available data sources for conducting policy analysis regarding consumer responses to a wide range of food industry issues. The project will be conducted in two phases. Phase 1 will focus on the study of the differences between demand model estimates obtained using Homescan data and BLS data. Phase 1 involves the following steps: data management and preparation (month 1 to 8), estimation of demand systems (month 7 to 14), and comparison of elasticities and welfare (month 12 to 19). A report summarizing the results of Phase 1 is planned to be finalized by the end of month 19. Phase 2 of the project will investigate methods to quantify and ameliorate biases induced by measurement error. Phase 2 includes the following steps: data construction (month 15 to 19), demand systems estimation and comparison of estimates (months 18 to 24), and estimation of measurement error models (months 24 to 28), and preparation of report for phase 2 of the project (months 28 to 32). The last 4 months of the project will be utilized to prepare the final project report and journal articles.
Project Methods
The project will be conducted in two phases: Phase 1 of the project intends to answer the first research question which deals with the differences in demand estimates obtained two random samples from the same population (CEX and Homescan data). The null hypothesis of no differences between CEX and Homescan elasticities will be tested using the two sample T2 statistics (Gupta et al., 1996). To evaluate the sensitivity of the results to the level of aggregation, the analyses will conducted at two different levels of aggregation. Phase 2 of the project will investigate methods to quantify and ameliorate biases induced by measurement error. The quantification of measurement error will be carried out but means of simulation procedures. Phase 2 can also be seen as an approach to validate Hoderlein and Mihaleva's (2008) procedure using household level Stone-Lewbel (SL) prices (constructed from regional prices). Whereas these authors provide evidence that, relative to the use of regional price indices, the use of SL-prices increases price variation which in turn results in more plausible signs of the demand coefficients and precision of parameter estimates, the procedure has not been validated using observed prices. Finally, in Phase 2 we will evaluate the potential of combining publicly available data sources to reduce measurement error. Data and Demand Modeling Details: The data used in this research project are from the BLS's CEX and Detailed Monthly Consumer Price Index (CPI), and from the Nielsen Homescan Panels and Quarterly Food-at-Home Price (QFAHP) Database provided by Economic Research Service (ERS). The analysis is based on the annual CEX and Nielsen Homescan surveys for 2001-2006. This research focuses on two demand systems. First, we consider an aggregate demand system for U.S. food consumption containing 8 commodities. Second, we consider a disaggregate demand system for the fruits and vegetables group containing three commodities. For estimation purposes we plan to use the recently proposed Exact Affine Stone Index (EASI) demand system (Lewbel and Pendakur, 2009). The main rationale behind the project has to do with a) eliminating barriers for the analysis of economic and social factors behind food consumption and nutrition; and 2) further investigate both the impact and solutions to the measurement error problems present in the data which might be biasing results of empirical analysis. The results of the project have important implications for researchers working in the area of consumer demand analysis. For example, if the results from demand model estimation are found to be invariant to the data set used, this will imply that demand analysis results obtained using publicly available data sources are as good (or bad) as those obtained using Homescan data; thus eliminating the need for researchers to get access to the proprietary and costly Homescan data. On the other hand, if demand model estimation results are found to be different across data sets, this will require some further investigation on the sources of variation and procedures to ameliorate estimation biases.

Progress 02/15/11 to 02/14/14

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? No further accomplishments since 2012 report due to resignation of P.I. from Clemson University effective June 15,2013.

Publications


    Progress 02/15/12 to 02/14/13

    Outputs
    Target Audience: The main target audience are scientists working on food policy analysis. Changes/Problems: Several factors have delayed the progress of the project including: 1) Maternity and sick leave of the Co-PI Tullaya Boonsaeng. 2) Maternity and sick leave of the Co-Pi Carlos Carpio 3) Internal re-organization within the University (the 2 Co-PIs have been in three different departments in the last 2 years. Therefore, we are now approximately 12 months behind schedule. We plan to request a 1 year no-cost project extension. What opportunities for training and professional development has the project provided? Two graduate students were involved in the project. They both became familiarized with data management techniques of big datasets as well as with the technical aspects of consumer demand estimation. How have the results been disseminated to communities of interest? We presented two papers at the Annual Meetings of the American Agricultural Economics Association which is attended by professionals in the field. Each session was attended by approximately 30 scientists. The papers are publicly available in the Ageconsearch website. According to the website statistics, the papers have been download a total of 179 times during the reporting period. What do you plan to do during the next reporting period to accomplish the goals? During the next year we plan to finish Phase 1 and start working on Phase 2 of the project.

    Impacts
    What was accomplished under these goals? The efforts in year 2 were basically focused on writing the results of all the analyses performed in year 1: 1) Estimation of a demand system using eight food commodity CEX expenditure data and BLS CPI based SL price using Hoderlein and Mihaleva (2008,J.Econometrics) procedure. The procedures proposed by these authors were modified to account for the presence of zero expenditures in the CEX survey data. We explored the robustness of the elasticity estimates to the use of different CPIs (monthly, quarterly and constant prices). 2) Estimation of a demand system for the same group of goods and the same demand system used in Step 1 but using Homescan data.

    Publications

    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2012 Citation: Brady, K., C.E., Carpio, and T. Boonsaeng: Temporal Aggregation in the Estimation of Food Demand using Cross Sectional Data: Annual Meetings of the American Agricultural and Applied Economics Association, Seattle, Washington, August 12-14, 2012.
    • Type: Conference Papers and Presentations Status: Accepted Year Published: 2012 Citation: Castellon, C., T., Boonsaeng, and C.E. Carpio: Demand System Estimation in the Absence of Price Data: Annual Meetings of the American Agricultural and Applied Economics Association, Seattle, Washington, August 12-14, 2012.
    • Type: Theses/Dissertations Status: Accepted Year Published: 2012 Citation: Leffler, K. 2012. Temporal Aggregation and Treatment of Zero Dependent Variables in the Estimation of Food Demand using Cross-Sectional Data. Unpublished M.S. thesis. Department of Applied Economics and Statistics, Clemson University.
    • Type: Theses/Dissertations Status: Accepted Year Published: 2012 Citation: Castellon, C. 2012. Demand for Food In Ecuador and the United States: Evidence from Household-Level Survey Data. Unpublished M.S. thesis. Department of Applied Economics and Statistics, Clemson University.


    Progress 02/15/11 to 02/14/12

    Outputs
    OUTPUTS: The main objective of this study is to evaluate the potential of using publicly available datasets from the Bureau of Labor and Statistics (BLS) and state of the art econometric methods in lieu of the privately owned Homescan data for the econometric estimation of food demand models. Pre-estimation stage (datasets creation) 1)BLS data: We constructed two datasets using the Consumer Expenditure Survey (CEX) data and the monthly and quarterly consumer price indices (CPIs) from year 2002 to 2006 (6,000 observations/year). The first dataset corresponds to expenditures and prices of eight aggregate food commodities. The second dataset corresponds to expenditures and prices of a disaggregate fruits and vegetables group. Commodity group prices (Stone-Lewbel prices, SL price indices) were calculated using CPIs and subgroup budget shares. 2)Nielsen Homescan Data: Two datasets comparable to those constructed with the BLS data were also constructed using the Nielsen data. To construct food commodities, individual products (at the brand-flavor-size level) first had to be aggregated into aggregate products. We included Nielsen Homescan surveys for 2002-2006 (7,000 observations/year). Fisher price indices were used as estimates of commodity group prices. Estimation Stage-Phase 1 We have conducted the following analyses: 1)Estimation of a demand system using eight food commodity CEX expenditure data and BLS CPI based SL price using Hoderlein and Mihaleva (2008,J.Econometrics) procedure. The procedures proposed by these authors were modified to account for the presence of zero expenditures in the CEX survey data. We explored the robustness of the elasticity estimates to the use of different CPIs (monthly, quarterly and constant prices). 2) Estimation of a demand system for the same group of goods and the same demand system used in Step 1 but using Homescan data. Step 3 involves the comparison of elasticity and welfare measures obtained in Steps 1 and 2. However, since the time frame for data collection in the CEX survey (2 weeks) and Homescan (at least 10 months) is different, in Step 2 we estimated and compared demand systems using two different levels of temporal aggregation for each individual in the sample: a randomly selected month and the average month within a year. Given the presence of a high percentage of zero expenditure in the monthly data we used two econometric methods for estimation of demand models with this dataset: Shonkwiler and Yen (1999, AJAE) two step censored demand estimation procedure, and a simple OLS method (Blundell and Meghir, 1987, J.Econometrics). Since the monthly data is a random sample from the annual data, elasticities and marginal using the annual data are assumed to be the "true" values. Conferences, Collaborations, Dissemination: "Using Scanner Data to Answer Food Policy Questions", conference organized by the USDA-ERS June 1-2, 2011. Collaboration with USDA-ERS personal in charge of developing the QFAHPD. Two paper proposals were submitted to the 2012 American Agricultural Economics Association organizing committee. Students involved in project: Kristyn Leffler, MSc. in Applied Economics and Statistics (May 2012). PARTICIPANTS: Carlos, C.E.: Coordinate and project management, leader on econometric aspects of project and preparation of reports. Boonsaeng, T.: Oversee and work managing the data and model estimation. Graduate students working in project(M.Sc. students): 1)Kristyn Leffler: Nielsen Homescann data management and analysis. 2) Cesar Castellon: BLS data management and analysis. Partner Organizations: USDA-Economic Research Service. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

    Impacts
    Since the main findings of this research project concern the results of the estimation of food demand systems using different data sources, the summary of the outcomes is separated by the type of data used in the estimation. 1)Demand System Estimation using CEX expenditure data and BLS CPI based SL prices (eight food commodity groups). 1.1.A Method for Estimation of SL Prices with Censored Expenditure Data. As mentioned previously, the construction of SL price indices combines CPIs and budget shares of the sub-groups that make up the aggregate commodity. However, when expenditures in one or several of the sub-groups are zero, the aggregate commodity SL price is undefined. While Hoderlein and Mihaleva (2008) avoided the problem by dropping the censored observations, this solution, though plausible for lower levels of censoring, results quite restrictive for higher levels of zero observations. With this in mind, we propose the use of a regression imputation approach similar to the one used in the literature for the calculation of quality adjusted unit values. The procedure involves two steps: 1) regressing SL prices of non-censored observations on a set of demographic characteristics, and 2) estimating the SL prices using the parameters of the regression obtained in step 1 and the socio-demographic characteristics of households with missing prices. 1.2.Sensitivity of Estimation Results to the Use of Different CPIs Three series of SL prices are constructed using alternative regional CPIs: monthly, quarterly, and no price variation across households. Thus, we explored the robustness of the estimation results to the chosen CPI. Our overall results indicate that elasticities and marginal effects estimates are not sensitive to the type of CPI used. We conclude that the incorporation of CPI data in the calculation of SL prices plays a limited role, thereby making possible the estimation of demand systems in the absence of price information. 2)Demand System Estimation using Nielsen Homescann Data (eight food commodity groups). We conclude that the models using monthly data closely approximate the underlying annual expenditure elasticities, but do a poor job estimating own- and -cross price elasticities and marginal effects. This finding is true for both the uncensored model (OLS model) and the censored model attempting to account for the cause of the zero expenditure. As a result, we conclude that the simplicity principle applies, at least when using Homescan data: the more complex model does not provide a significant improvement in precision. The use of shorter time frames has implications for the resulting price elasticities and marginal effects: they will be inconsistent, but not consistently biased in any direction.

    Publications

    • Leffler, K. 2012. Temporal Aggregation and Treatment of Zero Dependent Variables in the Estimation of Food Demand using Cross-Sectional Data. Unpublished M.S. thesis. Department of Applied Economics and Statistics, Clemson University.