Progress 10/01/08 to 09/30/09
Outputs OUTPUTS: Over the past year the Avian Knowledge Network (AKN, http://www.avianknowledge.net) has made much progress in building a network of contributors, organizing massive quantities of observational data, making these data available, and providing intuitive explorations, visualizations, and analyses of these data. Members of the AKN project team, who encompass a continent-scale group of researchers who have academic, federal agency, and non-governmental organizations have focused on the development of processes that will allow us to accurately predict species occurrence across broad spatial and temporal scales. To accomplish this we have been working with the massive volume of bird occurrence data made available through the AKN. AKN DATA CENTERS: Development of data centers that served as the primary access nodes to the AKN progressed well. A total of 3 active data centers have been established (California Avian Data Center, Northeast Partners and Flight Data Center, and the Cornell Lab of Ornithology Data Center), with several more in development. Data Centers greatly facilitate data discovery in distributed databases, but in the absence of customized toolkits for their management and query they depend on expert database administrators and analysts. Because a goal of the AKN is to 'unlock the data' to the broadest possible audience, we have created a series web-based tools for managing, querying, and visualizing avian observation data. Though large datasets are amenable to summaries commonly used in research and reporting, patterns in the data may not be readily apparent, in part due to the fact that data may come from disparate sampling designs with varying filters for quality control. Specialized statistical techniques are often required to tease apart these patterns. AKN REFERENCE DATASETS: As part of our work related to the this funding, the Cornell Lab of Ornithology Data Center has released the eBird Reference Dataset. eBird (http://www.ebird.org), is a citizen science project that enlists the public in collecting large quantities of data across an array of habitats and locations over long spans of time. eBird is the largest dataset housed at the AKN, and contains more than 16.5 million observations, gathered during more than 917 thousand sampling events, at more than 155 thousand locations throughout the western hemisphere and New Zealand. PARTICIPANTS: Participant Individuals: CoPrincipal Investigator(s) : Andre A Dhondt; Grant Ballard; Daniel Fink Technician, programmer(s) : Kevin Webb; Tim Levatich; Dan Danowski; Doug Moody; Mark Herzog; Chris Rintoul Senior personnel(s) : Nadav Nur; Leonardo Salas; Marshall Iliff Technician, programmer(s) : Michael Fitzgibbon; Dennis Jongsomjit; Christine Howell; Diana Stralberg Partner Organizations: PRBO Conservation Science: Financial Support; In-kind Support; Facilities; Collaborative Research; Personnel Exchanges USDA Forest Service Redwood Science Lab: In-kind Support; Facilities; Collaborative Research; Personnel Exchanges Bird Studies Canada: In-kind Support; Collaborative Research; Personnel Exchanges TARGET AUDIENCES: Educators and researchers from academic, federal agency, and non-governmental organizations who are focused on the development of processes that will allow us to accurately predict species occurrence across broad spatial and temporal scales. PROJECT MODIFICATIONS: Not relevant to this project.
Impacts The StatioTemporal Exploratory Model (STEM): Modeling dynamic species distributions requires that analyses deal with spatiotemporal variation on two main scales. Ecological systems often exhibit strong homogeneity when viewed at "fine" or "local" scales. There are many processes that induce similarity of nearby observations. For example, the fine-scale spatial and temporal patterning of resources induces corresponding local distribution patterns and juvenile dispersal limitations help define the extent of "locality". Thus, the importance of accounting for spatial and temporal correlation has been broadly recognized. In contrast to fine-scale homogeneity, many ecological systems also exhibit strong heterogeneity when viewed at "coarse" or "global" scales. For example, it is known that individuals of the same species often occupy different specialized habitats at the edges of their distributions and population dynamics processes such as the Allee effect and source-sink dynamics can create spatial patterning at relatively large spatial scales. Similarly, in the temporal domain, large-scale effects like El Nino/La Nina and North Atlantic Oscillation create strong, relatively abrupt changes in population size and composition. The motivation for this work was to explore the continent-wide inter-annual migrations of common North American birds using data from the citizen science project, eBird (http://www.ebird.org). This is challenging in part because of the great variation in migration dynamics between species. To deal with this, we sought to develop a highly automated STEM capable of producing objective, dynamic species distribution estimates with a minimum of user inputs. This STEM model was compared to a simpler bagged decision tree model without any scale-structure. We found that for species with highly dynamic annual migrations, STEM consistently outperformed the simpler bagged decision tree models. When applied to non-migratory species STEM and the bagged decision trees achieved comparable performance.
Publications
- Sorokina D., Caruana R., Riedewald M., and Fink, D. 2008. Detecting Statistical Interactions with Additive Groves of Trees. In Proc. International Conference on Machine Learning (ICML), pages 1000-1007.
- Sullivan, B. L., C. L. Wood, M. I. Iliff, R. E. Bonney, D. Fink, and S. Kelling. 2009. eBird: A Citizen-based Bird Observation Network in the Biological Sciences. Biological Conservation 142:2282-2292.
- Fink. D. and Hochachka, W.M. 2009. Gaussian semiparametric analysis using hierarchical predictive models. Environmental and Ecological Statistics,3,1011-1035. Appearing in special monograph on Modeling Demographic Processes in Marked Populations D.L. Thomson et al. (eds).
- Hochachka, W.M., R. Caruana, D. Fink, S. Kelling, A. Munson, M. Riedewald, D. Sorokina, S. Kelling. 2007 Data mining for discovery of pattern and process in ecological systems. Journal of Wildlife Management 71(7)2427-2437.
- Kelling, S., Hochachka, W.M. Fink, D. Riedewald, M. Caruana, R., Ballard, G. and Hooker, G. 2009. Data-intensive Science: A New Paradigm for Biodiversity Studies. BioScience, 59: 613-620.
- Submitted 2009 Shaby, B. and Ruppert, D. Tapered Covariance: Bayeian Estimation, Asymptoics, and Applications. 2009. Submitted to Journal of American Statistical Association.
- 2009 Fink, D, Hochachka, W., Zuckerberg, B., Winkler, DW, Shaby, B, Munson, MA, Hooker, G, Riedewald, M, Sheldon, D, and Kelling, S. 2009. Spatiotemporal exploratory models for broad-scale survey data. Submitted to Ecological Applications.
|