Source: MICHIGAN STATE UNIV submitted to NRP
HIERARCHICAL MODELS FOR LARGE GEOSTATISTICAL DATASETS WITH APPLICATIONS TO FORESTRY AND ECOLOGY
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0218204
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
May 1, 2009
Project End Date
Apr 30, 2014
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
MICHIGAN STATE UNIV
(N/A)
EAST LANSING,MI 48824
Performing Department
Forestry
Non Technical Summary
With the increasing popularity and availability of spatial referencing technologies such as Geographical Information Systems (GIS) and Global Positioning Systems (GPS) that can identify geographical coordinates with a simple hand-held device, scientists and researchers engaged in a variety of disciplines today have access to geocoded data. Statistical models accounting for spatial associations have, not surprisingly, become an enormously active area of research over the last decade. Hierarchical models that model variation in multiple levels have, in particular, become extremely popular devices for spatial modeling. With spatial data that are multivariate (i.e. have numerous variables) and temporal (collected over time), scientists seek to hypothesize extremely rich association structures. These, in turn, lead to rather complex hierarchical models that are computationally expensive even for data collected over a moderate number of locations. Matters become completely impractical with a large number of sites (say thousands). This team recognizes a need for statistical modeling of large multivariate geostatistical data. The PI proposes a model-based setup to tackle a wide variety of large geostatistical or point-referenced datasets. The emphasis is on models that can be executed even with moderately powerful computing tools and so would be accessible to a large number of researchers.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2040610108010%
1230613209060%
1320430209010%
1360613209020%
Goals / Objectives
This proposal lays down a comprehensive framework for carrying out statistical inference on point-referenced spatial data where data are available from a large number of locations. The focus of the proposal is methodological rather than purely theoretical or purely applied. Thus, statistical theory is used to develop mathematically formal but computationally feasible methods that can have a broad range of applications. Theoretical derivations and new results will be explored, but always keeping in mind the practising spatial analyst. The long-term goal of the PI is to develop a full suite of statistical methods that estimate spatial models in a wide variety of experiments in forestry and ecology. A recurrent underlying theme of the proposed methods that makes it different from existing methods is that the modeler does not need to sacrifice richness in modeling as a compromise for the large datasets. This resolves the statistical irony that large datasets are precisely where statistical estimates of rich association structures are permissible.
Project Methods
The PI will develop a full suite of statistical methods that estimate spatial models in a wide variety of experiments in forestry and ecology. Methods will include both mathematical/statistical theoretical development of models and computing environments and illustrative application of these models to answer challenging questions in the fields of forestry and ecology.

Progress 05/01/09 to 04/30/14

Outputs
Target Audience: Researchers, practitioners, and students in environmental and ecological sciences who analyze spatial-temporal data sets. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? The theory and accompanying modeling software was disseminated via peer reviewed journal articles and short courses. The short courses were created for graduate students, faculty, and practitioners in applied environmental/ecological sciences. Recent short courses were given at: 1) Joint workshop between the "Ecology and Environment," "Bayes Methods," and "Spatial Statistics" working groups of the German Region of the International Biometric Society together with the "Forest Biometrics" unit of the German Association of Forest Research Stations, Freising, Germany, November 6-8, 2013. 2) National Ecological Observatory Network (NEON) Applied Bayesian Regression Workshop, Boulder, CO, March 7--8, 2013. 3) University of Nebraska–Lincoln, Department of Statistics, Lincoln, NE, on October 15-16, 2012. How have the results been disseminated to communities of interest? The theory and accompanying modeling software was disseminated via peer reviewed journal articles and short courses. The short courses were created for graduate students, faculty, and practitioners in applied environmental/ecological sciences. Recent short courses were given at: 1) Joint workshop between the "Ecology and Environment," "Bayes Methods," and "Spatial Statistics" working groups of the German Region of the International Biometric Society together with the "Forest Biometrics" unit of the German Association of Forest Research Stations, Freising, Germany, November 6-8, 2013. 2) National Ecological Observatory Network (NEON) Applied Bayesian Regression Workshop, Boulder, CO, March 7--8, 2013. 3) University of Nebraska–Lincoln, Department of Statistics, Lincoln, NE, on October 15-16, 2012. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? As noted in the major goals I, along with colleagues, developed a new statistical model called the predictive process that is used to draw inference from large and point-referenced spatial-temporal data sets. This and related work was illustrated in several theoretical and applied studies that were published in peer reviewed journal articles. The methodology was also developed into an open source software package called spBayes that is available on the Comprehensive R Archive Network. This work has allowed modelers to avoid sacrifice richness in modeling when analyzing large spatial datasets.

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Finley, A.O., S. Banerjee, B.D. Cook, and J.B. Bradford. (2013) Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets. International Journal of Applied Earth Observation and Geoinformation, 22:147160.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Swanson, A.*, S. Dobrowski, A.O. Finley, J.H. Thorne, and M.K. Schwartz. (2013) Spatial regression methods capture prediction uncertainty in species distribution model projections through time. Global Ecology and Biogeography, 22:242251.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Record, S., M. Fitzpatrick, A.O. Finley, S. Veloz, and A. Ellison. (2013) Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change. Global Ecology and Biogeography, 22:760771.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Babcock, C.*, J. Matney*, A.O. Finley , A. Weiskittel, and B.D. Cook. (2013) Multivariate spatial regression models for predicting individual tree structure variables using LiDAR data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 6:614.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Junttila, V. A.O. Finley, J.B. Bradford, and T. Kauranne. (2013) Strategies for minimizing sample size for use in airborne LiDAR-based forest inventory. Forest Ecology and Management, 292:7585.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Guhaniyogi, R., A.O. Finley , S. Banerjee and Rich K. Kobe. (2013) Modeling complex spatial dependencies: low-rank spatially-varying cross-covariances with application to soil nutrient data. Journal of Agricultural, Biological, and Environmental Statistics, 18:274298.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Johnson, K.D., J.W. Harden, A.D. McGuire, M. Clark, F. Yuan, A.O. Finley. (2013) Permafrost and organic layer interactions over a climate gradient in a discontinuous permafrost zone. Environmental Research Letters, 8:112.


Progress 01/01/13 to 09/30/13

Outputs
Target Audience: P { margin-bottom: 0.08in; } Environmental researchers, undergraduate and graduate students in environmental and quantitative sciences, and natural resources professionals. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Short course entitled Bayesian Modeling for Spatial and Spatio-Temporal Data Analysis. The University of Nebraska–Lincoln, Department of Statistics, Lincoln, NE, on October 15-16, 2012. Short course entitled Applied Bayesian Spatio-temporal Data Analysis. National Ecological Observatory Network (NEON) Applied Bayesian Regression Workshop, March 7-8, 2013. How have the results been disseminated to communities of interest? Short courses, software and peer reviewed publications. What do you plan to do during the next reporting period to accomplish the goals? P { margin-bottom: 0.08in; } Over the next reporting period I will continue to develop statistical methods for modeling large space and time indexed data sets. These methods will be focused on fitting the desired models to massive data sets, while still delivering valid statistical inference and prediction.

Impacts
What was accomplished under these goals? P { margin-bottom: 0.08in; } Bayesian methods have become popular for spatio-temporal data modeling, given their flexibility to estimate models that would otherwise be infeasible. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic-order with the number of spatial locations and/or time points, rendering such models infeasible for large datasets. This situation is exacerbated in multivariate settings with several spatially dependent response variables, where the matrix dimensions increase by a factor of the number of response variables modeled. My work over the reporting period focused on addressing these modeling challenges. In particular, my work focused upon developing a comprehensive process-based framework, referred to as the predictive process, for carrying out statistical inference on large spatial datasets collected using Geographical Information Systems (GIS) and related technologies arising from a wide variety of settings.

Publications

  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Swanson, A.* S. Dobrowski, A.O. Finley, J.H. Thorne, and M.K. Schwartz. (2013) Spatial regression methods capture prediction uncertainty in species distribution model projections through time. Global Ecology and Biogeography. DOI: 10.1111/j.1466-8238.2012.00794.x
  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Finley, A.O., S. Banerjee, B.D. Cook, and J.B. Bradford. (2013) Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets. International Journal of Applied Earth Observation and Geoinformation. DOI: 10.1016/j.jag.2012.04.007
  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Record, S., M. Fitzpatrick, A.O. Finley, S. Veloz, and A. Ellison. (2013) Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change. Global Ecology and Biogeography. DOI: 10.1111/geb.12017
  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Babcock, C., J. Matney, A.O. Finley, A. Weiskittel, and B.D. Cook. (2013) Multivariate spatial regression models for predicting individual tree structure variables using LiDAR data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. DOI: 10.1109/JSTARS.2012.2215582
  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Junttila, V. A.O. Finley, J.B. Bradford, and T. Kauranne. (2013) Strategies for minimizing sample size for use in airborne LiDAR-based forest inventory. DOI: 10.1016/j.foreco.2012.12.019
  • Type: Journal Articles Status: Accepted Year Published: 2013 Citation: Guhaniyogi, R., A.O. Finley, S. Banerjee and Rich K. Kobe. (2013) Modeling complex spatial dependencies: low-rank spatially-varying cross-covariances with application to soil nutrient data. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-013-0140-3.


Progress 01/01/12 to 12/31/12

Outputs
OUTPUTS: My research and outreach efforts over the reporting period have advanced statistical theory, methodology, software, and instruction with the aim to enable current and next generation scientists and practitioners to more fully leverage the increasing wealth of environmental data to draw valid inference about large and complex spatial-temporal systems. Although much of my work is motivated by environmental monitoring data, the advancements in spatio-temporal data modeling is finding use in fields such as public and environmental health, agriculture, engineering, climate sciences, and geosciences where the fundamental goal is the same--use new findings to help improve society. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: The proposed modeling frameworks and software are developed for Environmental researchers and management practitioners. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Over the last decade, hierarchical models implemented through Markov chain Monte Carlo (MCMC) methods have become especially popular for spatio-temporal data modeling, given their flexibility to estimate models that would otherwise be infeasible. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic-order with the number of spatial locations and/or time points, rendering such models infeasible for large datasets. This situation is exacerbated in multivariate settings with several spatially dependent response variables, where the matrix dimensions increase by a factor of the number of response variables modeled. My work over the reporting period focused on addressing these modeling challenges. In particular, my work focused upon developing a comprehensive process-based framework, referred to as the predictive process, for carrying out statistical inference on large spatial datasets collected using Geographical Information Systems (GIS) and related technologies arising from a wide variety of settings.

Publications

  • Babcock, C., J. Matney, A.O. Finley, A. Weiskittel, B. Cook. (2012) Multivariate spatial regression models for predicting individual tree structure variables using LiDAR data. IEEE J-STAR, 99:1-9.
  • Delamater, P.L., A.O. Finley, and S. Banerjee. (2012) An analysis of asthma hospitalizations, air pollution, and weather conditions in Los Angeles County, California. Science of the Total Environment, 425:110-118.
  • Baribault, T., R.K. Kobe, and A.O. Finley. (2012) Tropical tree growth is correlated with soil phosphorus, potassium, and calcium, though not for legumes. Ecological Monographs, 82:189-203.
  • Eidsvik, J., A.O. Finley, S. Banerjee, and H. Rue. (2012) Approximate Bayesian inference for large spatial datasets using predictive process models. Computational Statistics and Data Analysis, 56:1362-1380.


Progress 01/01/11 to 12/31/11

Outputs
OUTPUTS: Project results were disseminated via contributed and invited talks at the following conferences and workshops: 1) Finley, A.O. Bayesian dynamic modeling for large space-time datasets using Gaussian predictive processes. GEOMED, October 21, 2011. Victoria, British Columbia, Canada. 2) Finley, A.O. and S. Banerjee. Advances in hierarchical spatial models for mapping forest attributes across large domains. Case Studies in Bayesian Statistics and Machine Learning, October 15, 2011. Carnegie Mellon University Pittsburgh, PA 3) Banerjee, S., and A.O. Finley. Computationally feasible hierarchical modeling strategies for large spatial data sets. International Statistical Institute Conference, August 22, 2011. Dublin, Ireland. Invited. 4) Finley, A.O., S. Banerjee, and B. Cook. A Bayesian functional data model for predicting forest variables using high-dimensional waveform LiDAR over large geographic domains. International Statistical Institute Conference, August 22, 2011. Dublin, Ireland. Invited. 5) Guhaniyogi, R., S. Banerjee, and A.O. Finley. Computationally feasible hierarchical modeling strategies for large spatial data sets. American Statistical Association Joint Statistical Meeting. August 1, 2011. Miami, FL. Invited. 6) Finley, A.O., S. Banerjee, and B. Cook. A Bayesian functional data model for predicting forest variables using high-dimensional waveform LiDAR over large geographic domains. American Statistical Association Joint Statistical Meeting. August 1, 2011. Miami, FL. Invited. 7) Finley, A.O. Advances in hierarchical spatial models for quantifying forest attributes. May 4, 2011. Workshop on Statistical Issues in Forest Management. Centre de recherches mathematiques. Universite Laval, Quebec. Invited. 8) Finley, A.O., S. Banerjee, and B. Cook. Bayesian functional data model for predicting forest variables using high-dimensional waveform LiDAR over large geographic domains. March 24, 2011, 1st Conference on Spatial Statistics. University of Twente, The Netherlands. 9) Finley, A.O. Modeling and mapping non-stationary multivariate processes for large spatial datasets. March 22, 2011. Environmental Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands. Invited. 10) Finley, A.O. Advances in hierarchical spatial models for quantifying forest attributes. February 21, 2011. Lappeenranta University of Technology, Department of Mathematics, Lappeenranta, Finland. Invited. 11) Finley, A.O., S. Banerjee, and B. Cook. Bayesian functional data model for predicting forest variables using high-dimensional waveform LiDAR over large geographic domains. December 17, 2010, American Geophysical Union. San Francisco, CA. PARTICIPANTS: None, beyond the coauthors listed on the publications. TARGET AUDIENCES: The target audience for the project output include: environmental statisticians, forest resource professionals, and forest/agricultural managers. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Large point referenced datasets occur frequently in the environmental and natural sciences. Use of Bayesian hierarchical spatial models for analyzing these datasets is undermined by onerous computational burdens associated with parameter estimation. Low-rank spatial process models attempt to resolve this problem by projecting spatial effects to a lower-dimensional subspace. This subspace is determined by a judicious choice of "knots" or locations that are fixed a priori. One such representation yields a class of predictive process models for spatial and spatial-temporal data. My work over the reporting period expanded upon predictive process models with fixed knots to models that accommodate stochastic modeling of the knots. Here I view the knots as emerging from a point pattern and investigate how such adaptive specifications can yield more flexible hierarchical frameworks that lead to automated knot selection and substantial computational benefits. These models were illustrated using forest inventory, agricultural crop yield, and climate data.

Publications

  • Eidsvik, J., A.O. Finley, S. Banerjee, and H. Rue. (2011) Approximate Bayesian inference for large spatial datasets using predictive process models. Computational Statistics and Data Analysis. doi:10.1016/j.csda.2011.10.022
  • Finley, A.O., S. Banerjee, and B. Basso (2011) Improving crop model inference through Bayesian melding with spatially-varying parameters. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-011-0070-x
  • Salazar, E., B. Sanso, A.O. Finley, D. Hammerling, I. Steinsland, X. Wang, and P. Delamater. (2011) Comparing and blending regional climate model predictions for the American Southwest. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-011-0074-6
  • Guhaniyogi, R., A.O. Finley, S. Banerjee, A.E. Gelfand. (2011) Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics, DOI: 10.1002/env.1131
  • Finley, A.O., S. Banerjee, A.E. Gelfand. (2011) Bayesian dynamic modeling for large space-time datasets using Gaussian predictive processes. Journal of Geographical Systems, DOI: 10.1007/s10109-011-0154-8
  • Rena, Q., S. Banerjee, A.O. Finley, and J.S. Hodges. (2011) Variational Bayesian methods for spatial data analysis. Computational Statistics and Data Analysis, 55:3197-3217.
  • Finley, A.O., S. Banerjee, and D.W. MacFarlane. (2011) A hierarchical model for quantifying forest variables over large heterogeneous landscapes with uncertain forest areas. Journal of the American Statistical Association, 106:31-48.
  • Woodall, C.W., A.W. D'Amato, J.B. Bradford, and A.O. Finley. (2011) Effects of stand and inter-specific stocking on maximizing standing tree carbon stocks in the eastern U.S. Forest Science, 57:365-378.


Progress 01/01/10 to 12/31/10

Outputs
OUTPUTS: Research during this reporting period focused upon improving the quantification of forest biomass which is a crucial input to large scale forest management and climate change mitigation/adaptation initiatives. Following the project's objectives, the investigator was interested in predicting possibly multiple continuous forest variables (e.g., biomass, volume, age) at a fine resolution (e.g., pixel-level) across a specified domain. Given a definition of forest/non-forest, this prediction is typically a two step process. The first step predicts which locations are forested. The second step predicts the value of the variable for only those forested locations. Rarely is the forest/non-forest predicted without error. However, the uncertainty in this prediction is typically not propagated through to the subsequent prediction of the forest variable of interest. Failure to acknowledge this error can result in biased and perhaps falsely precise estimates. In response to this problem, the investigator offered a modeling framework that would allow propagation of this uncertainty by deploying two latent processes (one continuous and one binary) generating the data. This approach resulted in novel low-rank hierarchical structures for which specialized estimation algorithms were implemented. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: The target audience includes: natural resource managers, modelers of spatial-temporal data, and undergraduate/graduate students in environmental management and statistics. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Although development of the proposed methods were motivated by substantive questions in the investigators domain of forestry, the potential advancements in spatiotemporal data modeling will find use in ields such as public and environmental health, meteorology, engineering, and geosciences where the fundamental goal is the same -- use new findings to help improve society. By redeeming the current and subsequent generations of investigators from using ad hoc and qualitative methods that often reveal deceptive stories, the proposed methods can have far reaching beneficial effects in environmental research that potentially touch unexpected corners of society.

Publications

  • Finley, A.O. (2010) Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence. Methods in Ecology and Evolution.
  • Munoz, J.D., A.O. Finley, R. Gehl, and S. Kravchenko. (2010) Nonlinear hierarchical models for predicting cover crop biomass using Normalized Difference Vegetation Index. Remote Sensing of Environment. 114:2833-2840.
  • Woodall, C.W., C.M. Oswalt, J.A. Westfall, C.H. Perry, M.D. Nelson, and A.O. Finley. (2010) Selecting tree species for testing climate change migration hypotheses using forest inventory data. Forest Ecology and Management. 259:778-785.
  • Banerjee, S., A.O. Finley, P. Waldmann, and T. Ericsson. (2010) Hierarchical spatial process models for multiple traits in large genetic trials. Journal of the American Statistical Association. 105:506-521.