Source: Clark University submitted to NRP
PARTNERSHIP: INTEGRATING LOCALLY-WEIGHTED META-REGRESSION AND MACHINE LEARNING TO CAPTURE SPATIAL COMPLEXITY IN MULTI-SCALE BENEFIT TRANSFER
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1032368
Grant No.
2024-67023-42696
Cumulative Award Amt.
$799,772.00
Proposal No.
2023-09504
Multistate No.
(N/A)
Project Start Date
Jul 15, 2024
Project End Date
Jul 14, 2027
Grant Year
2024
Program Code
[A1651]- Agriculture Economics and Rural Communities: Environment
Recipient Organization
Clark University
950 Main St.
Worcester,MA 01610
Performing Department
(N/A)
Non Technical Summary
The USDA spends billions of dollars per year on conservation to enhance environmental quality, ecosystem services and agricultural sustainability. The biophysical impacts of these programs (e.g., on soil retention and water quality) are relatively well understood and can be estimated using standard modeling approaches. Yet the economic benefits of these programs remain largely unknown, and credible information on non-market benefits is particularly lacking. Despite an extensive literature on non-market valuation, the methods from this literature are often impractical to use for the estimation of values provided by USDA and other conservation programs. Large-scale, applied valuation of this type almost universally requires benefit transfer, or BT. BT uses existing economic value estimates from prior studies at one or more locations to predict economic value estimates such as willingness to pay (WTP) at other, typically unstudied locations. Benefit transfer can produce economic value estimates for areas and ecosystem service improvements for which original economic valuation studies have not been conducted--thereby quantifying the economic value of large-scale agricultural conservation to the public. Yet BT methods to support reliable large-scale valuation are inadequately developed, particularly for applications such as resource conservation and water quality improvements with widespread, diffuse and patchy impacts. Due to the lack of sufficiently accurate BT methods, USDA and its partners struggle to produce credible estimates of non-market conservation benefits.Addressing this unresolved research and policy need requires a set of flexible, standardized BT approaches that are able to predict benefits for the large spatial scales over which conservation occurs, while simultaneously accounting for the important effects of localized, place-specific spatial and other dimensions on values for environmental improvements. Responding to this important gap in knowledge and methods for economic analysis, the present project will develop and evaluate BT procedures with a previously unattainable capacity to account for localized spatial heterogeneity of conservation-related environmental improvements over large spatial scales (such as water quality improvements), while identifying areas wherein improvements are most valued by target populations, including disadvantaged communities. Although applicable to any conservation outcome, methods will be illustrated for BTs that predict willingness to pay (WTP) for spatially diffuse water quality improvements.The project addresses USDA AFRI Environmental and Natural Resource Economics (A1651) program priorities, which call for "benefit transfer to inform benefit-cost calculations for conservation and natural resource policy design and implementation." These novel methods will integrate (1) locally weighted meta-regression models (LW MRMs) for WTP metadata that produce unique benefit functions for each site, (2) interactive map-based survey architecture that identifies highly valued ("salient") areas nationwide for specific environmental improvements, (3) a machine-learning spatial salience classification model (SCM) that uses these survey data to provide generalizable predictions, for any Census Block Group (CBG) nationwide, on the degree to which any potential watershed area is salient (or particularly important) to residents for water quality improvements, and (4) validated metadata on household WTP for water quality improvements in US waterbodies drawn from prior studies in the valuation literature, augmented with SCM results to support LW MRMs with enhanced WTP-prediction accuracy. The project will provide transformative yet standardized BT methods able to predict values due to diffuse conservation over large scales, together with a set of nationwide spatial salience data layers that can be used independently to identify areas wherein water quality improvements are most valued by residents of any CBG. These methods will enhance the accuracy of large-scale BT, by incorporating systematic information on the extent to which patchy environmental improvements occur in local (or non-local) areas that are important (or salient) to households. In doing so, these approaches will increase the capacity of USDA and others to quantify the economic values generated by agricultural conservation programs.
Animal Health Component
70%
Research Effort Categories
Basic
(N/A)
Applied
70%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
6050399301045%
6050399209040%
6050399205015%
Goals / Objectives
This project will develop and evaluate benefit transfer (BT) procedures able to account for localized spatial heterogeneity of conservation-related environmental improvements over large spatial scales, while identifying areas wherein improvements are most valued by target populations, including disadvantaged communities. These novel methods will enhance the accuracy of large-scale BT, by introducing systematic information on the extent to which environmental improvements occur in local (or non-local) areas that are important (or salient) to households living in specific CBGs. Although applicable to any conservation outcome, methods will be illustrated for BTs that predict willingness to pay (WTP) for diffuse water quality improvements. These new BT methods will integrate (1) locally weighted meta-regression models (LW MRMs) for WTP metadata that produce unique benefit functions for each site across the landscape, (2) interactive map-based survey architecture that identifies highly valued ("salient") areas nationwide for specific types of environmental improvements, (3) a machine-learning (ML) spatial salience classification model (SCM) that uses these survey data to provide generalizable predictions, for any Census Block Group (CBG) nationwide, on the degree to which any potential watershed area is salient to residents for water quality improvements, (4) validated metadata on household WTP for water quality improvements in US waterbodies, augmented with SCM results to support LW MRMs with enhanced WTP-prediction accuracy. The project will provide standardized BT methods able to predict values due to diffuse conservation over large scales, together with a set of nationwide spatial salience data layers that can be used independently to identify areas wherein water quality improvements are most valued by residents of any CBG.The project is organized around eight primary objectives:1. Finalize the theory and procedures necessary to integrate newly developed methods for Bayesian LW MRMs with novel ML algorithms for spatial salience classification, with the integrated approach designed to support spatially explicit BTs of water quality benefits from heterogeneous and patchy resource conservation over large spatial scales.2. Develop and implement a large-sample spatial salience survey to elicit data on the degree to which US households value, or view as "salient," potential water quality improvements in different HUC 8 watersheds regionally and nationwide. Integrate survey data with supporting information from publicly available, nationwide geospatial and socioeconomic data layers.3. Using the integrated survey, geospatial and socioeconomic data, implement the ML spatial SCM to predict, for every CBG nationwide, the degree to which any nationwide HUC 8 watershed is considered to be relatively salient (or is prioritized) by CBG residents for water quality improvements.4. Using spatial salience predictions, retrospectively update the PDs existing metadata on per household WTP for water quality improvements to incorporate information on the extent to which the improvements that were valued by each underlying, primary study in the metadata likely occurred in salient or non-salient watersheds, for the originally sampled households.5. Using the extended metadata, specify and implement a novel LW MRM of per household WTP for water quality improvements, incorporating previously unavailable information on the localized spatial salience of the affected waterbodies.6. Using both out-of-sample cross validation (CV) and an applied, large-scale BT case study in the Mid-Atlantic US, evaluate the BT performance of the integrated SCM / LW MRM for predicting per household and aggregate WTP, and compare the accuracy of the new approach to that of prior MRM BTs that are unable to accommodate localized spatial salience.7. Using SCM results, further develop novel, nationwide spatial salience geospatial data layers that can be used to inform BTs and other valuation modeling, identifying the specific HUC 8 watersheds for which water quality improvements are predicted to be salient for households in any selected CBG across the continental US, including CBGs associated with disadvantaged communities or those with environmental justice concerns.8. Coordinate results to provide methods, guidance and protocols for meta-function BTs able to predict WTP over large spatial scales while capturing the effects of local heterogeneity in quality changes and benefits to marginalized communities.
Project Methods
Project methods are organized around eight primary tasks.1. We will first finalize the theory and procedures to integrate newly developed methods for Bayesian locally weighted meta-regression models (LW MRMs) with machine learning (ML) algorithms for spatial salience classification, with the integrated approach designed to support spatially explicit benefit transfers (BTs) of water quality benefits from heterogeneous and patchy resource conservation over large spatial scales. Our methodological point of departure is the LW MRM developed by the PDs for a prior USDA AFRI supported project. This LW MRM will be enhanced to account for the extent to which spatially heterogeneous water quality improvements occur in watershed areas (here, HUC 8s) that are salient or non-salient to households residing in each census block group (CBG), where the salience of particular watersheds can (and likely will) vary across households in different CBGs.2. Extending the spatially interactive survey architecture under development by the PDs, we will then develop and implement a large-sample spatial salience survey to elicit data on the degree to which US households value, or view as "salient," potential water quality improvements in different HUC 8 watersheds regionally and nationwide. These methods augment a traditional online survey instrument with integrated, dynamic GIS maps that offer a visually intuitive alternative to traditional survey questions aimed at eliciting features of interest in geographic space. Adapting these methods, we will develop, pretest, and implement a novel, nationwide, push-to-web spatial salience survey of randomly selected households across the continental US. Respondents will be asked, in a guided fashion and via map interactions, to identify HUCs of special importance across the country for water body uses and water quality improvements, e.g., due to recreational preferences, family ties, or for other use or non-use reasons. The interface will present each respondent with an interactive map of the contiguous US, overlaid with HUC 8 boundaries of affected watersheds (showing water bodies, key places, etc.) that allow survey respondents to pan and zoom to any desired location, as well as search for features using a natural language search tool, and then indicate the salience of the watershed containing these features. The survey data will be linked to supporting information from publicly available, nationwide geospatial and socioeconomic data layers.3. Using the integrated survey, geospatial and socioeconomic data, we will design and implement a ML spatial salience classification model (SCM) to predict, for every CBG nationwide, the degree to which any nationwide HUC 8 watershed is considered to be relatively salient (or is prioritized) by CBG residents for water quality improvements. For any CBG, we hypothesize that the salience indicators elicited in Task 2 will be related to multiple attributes of each HUC 8, each of which can ex ante be envisioned as affecting salience status, such as total river miles, area of lakes and reservoirs, special recreation areas (day use areas, state parks, etc.), total population residing within the HUC, and distance from the respondent's home CBG, among many others. Other explanatory variables will include Census (demographic) information from the respondent's home CBG, such as racial/ethnic category composition, age structure, educational attainment, and income. Grounded in these hypothesized relationships, among the core methodological contributions of the proposed research is development of an entirely new Salience Classification Model (SCM), using ML methods. The ultimate goal of the SCM is to determine, for any CBG / HUC8 combination in the contiguous US, if the considered HUC is deemed as salient or not, from the perspective of the specific CGB. The SCM will be estimated using an adaptation of Random Forests (RFs).4. The next task will retrospectively update the PDs extant metadata on per household WTP for water quality improvements to incorporate information on the extent to which the improvements that were valued by each underlying, primary study observation in the metadata occurred in salient or non-salient watersheds, for the originally sampled households, as predicted by the SCM. We will also review the literature for new studies that can potentially be added, thereby extending metadata size and coverage.5. The new set of spatial salience variables, calculated for each metadata source observation, will support an augmented specification of the baseline LW MRM. This extended MRM will produce a uniquely tailored benefit function for any desired "policy site" and water quality scenario for which benefit estimates are required. This extended MRM will predict per household benefits as a function of the degree to which spatially heterogeneous water quality improvements in the considered policy scenario occur in home, salient or non-salient areas for households living in each CBG within the BT market area, in addition to effects of the wide array of additional MRM variables (e.g., on distance, the aggregate scale of affected water bodies, affected landscapes, the scope of water quality change, income, affected uses, affected water body types, etc.). The integrated model will enable the first large-scale BT able to accommodate both continuous spatial effects (e.g., distance, affected area size, spatial substitutes) along with discrete preference discontinuities and WTP hot/cold spots due to localized resource salience and heterogeneous environmental effects.6. We will evaluate the performance of the new approach as applied to implement BT for a multi-state water quality improvement scenario. We anticipate an illustrative application with heterogeneous and diffuse changes in water quality over the multi-state Mid-Atlantic region. These evaluations will compare the accuracy of the new approach to that of prior MRM BTs that are unable to accommodate localized spatial salience, using out-of-sample cross validation. We will consider the extent to which applied results from the proposed approach differ from those produced by current methods, including the extent to which each type of BT is sensitive to the spatial distribution of water quality changes across the landscape. We will consider implications not only for per household and aggregate WTP estimates, but also how these values are distributed across CBGs within the case study area. We will further consider the capacity of the proposed BTs to identify water quality changes that are most valued by disadvantaged communities, and to estimate the benefits realized by these groups.7. This task will reformat SCM results into a set of nationwide spatial-salience data layers. These layers will identify the specific HUC 8 watersheds for which water quality improvements are predicted to be salient for households in any selected CBG across the contiguous US. These geospatial data layers will be designed for "plug and play" use within other, future BTs and policy studies, including those that do not rely on LW MRMs. Among other uses, this new data resource can be used by researchers and practitioners to draw insight into the relevant extent of the market for water quality changes, by showing the specific areas for which water quality changes are deemed to be most salient for residents of any given CBG nationwide (focusing on the greatest distance for which watersheds are predicted to be salient). It can also be used to identify areas wherein water quality changes will be most important to households in CBGs with disadvantaged communities or those with environmental justice concerns.8. The final task will coordinate project results to provide generalizable insight and guidance into practical methods, guidance and protocols for BT procedures using the proposed methods, along with the accuracy to be expected when applying these methods.

Progress 07/15/24 to 07/14/25

Outputs
Target Audience:The target audiences reached during this reporting period include the project's Advisory Board, academics (university/college faculty and researchers), graduate students in environmental economics, scientists, and government agencies including USDA, along with their staff. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project is providing training for a mid-career research scientist at Clark University, along with providing skillset development opportunities at ICF for early and mid-career staff. How have the results been disseminated to communities of interest?Initial results of LLFs for valuation meta-analysis and BT have been presented at multiple conferences and invited seminars in both the US and internationally. Presentations to date include: Johnston, R.J. and K. Moeltner. 2024. Advancing the Frontier of Data Synthesis for Environmental Benefit Transfer: From Globally Linear Meta-Regression to Local Linear Forests. Environmental Economics Seminar, Yale School of the Environment, New Haven, CT, November 20. Johnston, R.J. and K. Moeltner. 2025. Random Forests for Benefit Transfer. 30th Annual Meeting of the European Association of Environmental and Resource Economists, Bergen, Norway, June 16-19. Johnston, R.J. and K. Moeltner. 2025. Advancing the Frontier of Data Synthesis for Environmental Benefit Transfer: From Meta-Regression to Local Linear Forests. Association of Environmental and Resource Economists Summer Conference, Santa Ana Pueblo, NM, May 28 - 30. Johnston, R.J. and K. Moeltner. 2025. Advancing the Frontier of Data Synthesis for Environmental Benefit Transfer: From Globally Linear Meta-Regression to Local Linear Forests. Society for Benefit-Cost Analysis 2025 Annual Conference, Washington, DC, March 13-14. What do you plan to do during the next reporting period to accomplish the goals?Work during the next reporting period will include tasks necessary to complete Objectives 1 - 5. We intend to complete the final steps required for Objective 1, finalizing all theory and procedures necessary to guide model implementation, metadata development and spatial-salience survey design. We will continue to develop, pretest and implement the novel, nationwide, push-to-web spatial salience survey of randomly selected households across the continental US (Objective 2). Using the integrated survey, geospatial and socioeconomic data, we will design and implement a ML spatial salience classification model (SCM) to predict, for every CT nationwide, the degree to which any HUC 8 watershed in the conterminous US is considered to be relatively salient (or is prioritized) by CT residents for water quality improvements (Objective 3). The next task will retrospectively update the PDs extant metadata on per household WTP for water quality improvements to incorporate information on the extent to which the improvements that were valued by each underlying, primary study observation in the metadata occurred in salient or non-salient watersheds, for the originally sampled households, as predicted by the SCM (Objective 4). This new set of spatial salience variables, calculated for each metadata source observation, will support an augmented specification of the meta-analysis via LLF. This extended model will predict per household benefits as a function of the degree to which spatially heterogeneous water quality improvements in the considered policy scenario occur in home, salient or non-salient areas for households living in each CT within the BT market area, in addition to effects of the wide array of additional MRM variables (Objective 5).

Impacts
What was accomplished under these goals? An in-person project kick-off meeting was held in Reston, VA on February 27-28, with all senior personnel in attendance. Among other meeting objectives, a comprehensive plan of work was developed for project tasks over the three-year project timeframe. Based on workplans developed during this meeting, tasks during the first project year emphasized tasks required to complete Objectives 1, 2, 3 and 4. We have now largely completed Objective 1--developing the foundational theory and analytical methods required to integrate novel methods for valuation meta-analysis (applied to per household willingness to pay (WTP) for water quality improvements) with spatial salience classifications derived using machine learning (ML) algorithms and household survey data. The integrated approach will support spatially explicit benefit transfers (BTs) of water quality benefits from heterogeneous resource conservation over large spatial scales. The first component of this work was to finalize the theory and corresponding variable definitions required to update the research team's existing valuation metadata on water quality values with spatial salience information. As detailed in the original proposal, we will update the metadata with variables that quantify the "salience" of affected waters, defined as the extent to which the water quality improvements considered within each primary study (within the metadata) occur within "home," "salient" or "ordinary" bodies of water, for the originally sampled households. We hypothesize that the salience of these affected waters will influence households' WTP for water quality improvements. We anticipate that incorporation of this information within valuation meta-analysis will allow more accurate WTP prediction and thus more accurate BT. As an initial step towards these proposed models, formal definitions of the proposed spatial salience variables were required, allowing these variables to be calculated for metadata observations. To summarize, each metadata observation includes a per household WTP estimate for a water quality scenario that reflects improvements to waters within a set of hydrologic unit code (HUC) 8 watersheds. We denote these improved watersheds as the "affected watershed area" or AWA. Each WTP estimate was produced using primary study data drawn from an original stated preference survey that sampled a population within a given sampled area (or SA). For each of these primary-study observations, we first identify all Census tracts (CTs) in the SA and HUC 8 watersheds contained in its SA, as well as all watersheds included in its AWA. We also determine the share of the sampled population from the original study for whom a given HUC 8 watershed (covered by the study's valuation scenario) is a "home HUC" (where they live). Using nationwide household survey data to be collected under Objective 2 (see below), we will then predict whether each HUC 8 watershed covered by each observation's AWA is predicted to be "salient" (or important) for corresponding CT residents, when considering water quality improvements. This salience information will be used to determine what proportion of improved miles or area are home, salient, and ordinary (non-salient and non-home) for the observation's SA population. Grounded in the formal definitions and procedures designed during the first project year, these novel spatial salience variables will be derived via implementation of a spatial salience questionnaire (Objective 2) integrated with a novel application of ML methods for spatial salience classification and prediction (Objective 3). We will then update the underlying metadata with the newly derived spatial-salience variables (Objective 4), allowing estimation of an extended meta-analysis that incorporates this new information (Objective 5). The meta-analysis will thus account for the extent to which spatially heterogeneous water quality improvements due to conservation occur to water bodies that are salient or non-salient to households. We also made progress towards completion of Objective 2, developing initial plans for the large-sample spatial salience survey that will elicit data on the degree to which US households value, or view as "salient," potential water quality improvements in different HUC 8 watersheds nationwide. This survey design extends methods developed by the PDs for another ongoing USDA AFRI project, Partnership: Next Generation Choice Experiment Architecture for Spatially Explicit Agricultural Conservation and Ecosystem Service Valuation (#2022-67023-36735), which develops questionnaire architecture to elicit spatial salience information statewide, with an application to Virginia. The current project will adapt this architecture for larger-scale, nationwide US applications. Within this new online questionnaire, respondents will be asked, in a guided fashion and via map interactions, to identify HUCs of special importance across the US mainland for water body uses and water quality improvements, e.g., due to recreational preferences, family ties, or for other reasons. The interface will present each respondent with an interactive map of the contiguous US, overlaid with HUC 8 boundaries of AWA (showing water bodies, key places, etc.) that allow survey respondents to pan and zoom to any desired location, as well as search for features using a natural language search tool, and then indicate the salience of the watershed containing these features and explain the reasons for salience. To support predictions of the extent to which watersheds are salient to respondents (Objective 3), the survey data will be linked to supporting information from publicly available, nationwide geospatial and socioeconomic data layers. Examples of these secondary watershed and population data summarized during the first year included watershed and individual land use area, length of streams, area of waterbodies, number of unique native fish species, area of public land, average number of cloudy days per year, average number of clear days per year, average depth of precipitation per year, area of ecoregions, number of drinking water wells and intakes, public health statistics, average air quality, length of impaired stream segments, and presence of a coastal beach. Progress has also been made towards Objective 5, which will specify and implement the meta-analysis of per household WTP for water quality improvements. We have designed the underlying econometric model and code that will be used to implement the proposed modeling and subsequent BT predictions of changes in water quality value from expected improvements. The approach is grounded in meta-analysis implemented via Local Linear Forests (LLFs), a novel approach to data-synthetic BT that allows the metadata to be processed in a manner that enables more efficient, accurate and straightforward BT value predictions, compared to current best-practice meta-regression models (MRMs). As applied to valuation meta-analysis, LLFs are essentially a hybrid approach that combines elements of Random Forests (RFs) and locally weighted meta-regression models (LW-MRM). Initial evaluations suggest that this new approach substantially improves BT accuracy without sacrificing theoretic properties, while reducing econometric and computational difficulties relative to leading alternatives. For example, we find that forest-based models substantially improve the within-sample accuracy of welfare predictions and tighten confidence intervals of predicted benefits for out-of-sample transfers. Simultaneously, these models avoid the implementation challenges of complex, regression-based approaches such as LW-MRM.

Publications