Source: UNIVERSITY OF MINNESOTA submitted to
FACT: CYBER-INFRASTRUCTURE FOR LANDSCAPE IMPACTS ON BIOCONTROL
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1023888
Grant No.
2020-67021-32477
Cumulative Award Amt.
$877,990.00
Proposal No.
2019-07456
Multistate No.
(N/A)
Project Start Date
Sep 1, 2020
Project End Date
Aug 31, 2025
Grant Year
2020
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Project Director
Chaplin-Kramer, R.
Recipient Organization
UNIVERSITY OF MINNESOTA
200 OAK ST SE
MINNEAPOLIS,MN 55455-2009
Performing Department
Institute on the Environment
Non Technical Summary
Agricultural insect pests cause significant crop losses, despite widespread use of pesticides that pose significant risks to people and the environment. While a large and growing literature recognizes the potential for natural enemies resident in the broader agricultural landscape to provide critical biocontrol, the lack of a definitive answer to the question of how habitat around farmland impacts pests after two decades of research suggests a dramatically new approach is needed to understand spatial dynamics of insect populations. Our goal is to build an open-source, standardized data platform for pest control analysis and prediction, to enable scientific understanding and the development of decision-support tools to guide land managers and growers. We will work toward this goal by: expanding a pest control database developed by project PIs (encompassing 18,219 observations of biocontrol variables across 6,789 sites globally) to vastly increase information on georeferenced pest and natural enemy distributions; acquiring relevant life history traits data matching major pests and natural enemies represented in the pest control database; acquiring Earth observations (EO) data of vegetation and climate for georeferenced locations of insect data; and developing the software infrastructure to automate the continued acquisition of insect, traits, and EO data and processing of these disparate data sources necessary for analysis. Better understanding and predicting how landscapes and insect life history interact to determine where and whether biocontrol can provide a reliable strategy for growers will enable improved land management that enhances crop yields, reduces reliance on pesticides, and mitigates risk exposure of growers.
Animal Health Component
30%
Research Effort Categories
Basic
40%
Applied
30%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2153199107070%
2157210107030%
Goals / Objectives
Our goal is to build an open-source, standardized data platform for pest control analysis and prediction, to enable scientific understanding and the development of decision-support tools to guide land managers and growers.Our specific activities to support this goal include:1) Expand a current biocontrol database to greatly increase georeferenced information on pest and natural enemy distributions and activity. Seek out large-scale, long-term datasets held by government and private actors; ensure privacy of spatially referenced data.2) Acquire relevant life history trait data matching major pests and natural enemies represented in the biocontrol database. Synthesize online databases currently scattered among many different taxonomic groups into one coherent framework of traits that predict pest survival, impact, and success (e.g., diet breadth, overwintering habit, generation time, dispersal mode), linked taxonomically to insect (biocontrol) data.3) Acquire Earth observations (EO) data for georeferenced locations of insect data. Develop a computational pipeline to access and process global datasets of EO-derived vegetation features such as height or complexity, greenness (variability and timing), and spectral diversity and EO-derived climatic features such as temperature, precipitation, and extreme weather events, linked spatially and temporally to study location and sampling dates of insect (biocontrol) data.4) Develop the software infrastructure to automate (1) continued acquisition of insect, traits, and EO data, (2) data harmonization and standardization, and (3) basic analysis pathways. Build a data schema, application interface, and user interface to facilitate submission of raw data, quality assurance/quality control, and subsequent interpretation and prediction by scientists and decision-support tool developers.5) Encourage adoption of the living database across research community (including students and extension) by hosting training workshops and developing curriculum for course exercises.
Project Methods
Insect abundance and activity: Expanding the database and data standardizationThe current database grew out of several previous syntheses on the impacts of landscape simplification on biocontrol services provided by natural enemies of agricultural pests (Figure 3). As such, studies were only included if biocontrol observations were obtained across at least 8 distinct sampling locations, all within crop fields and across a gradient of surrounding landscape composition. While this encompassed more than 130 studies, dozens more meeting the same criteria have already been published since the assembly of the database, and a far larger number of observations of pests or their enemies at individual point locations have been gathered without landscape variables in mind. Including such observations, typically undertaken by government agencies or private consultants, would greatly expand and enhance the biocontrol database. We will provide strict guidelines about the types of insect data that can be submitted; for example, including only measurements of insect abundance and activity on crop fields (rather than in weed patches or field margins). Second, we will build a carefully curated data submission portal with standardized fields and data input options to restrict variation in data quality; for example, defining what constitutes a locally diversified field and then forcing data providers to decide whether each site was diversified or not. Third, we will automate data-cleaning for fields that the data submission portal fails to standardize; for example, scripts that harmonize crop Latin names across studies. Our data submission portal for the living database will require that these insect abundance and activity measures include: location of observation (latitude and longitude), date of observation, taxa observed (to highest specificity known; even if only order), type of observation (abundance, infestation rate, crop damage, predation rate, pest population suppression rate, crop yield, etc.), sampling method (trap, visual survey, cage study, etc.), crop type in which the observation was made, and numeric value for the observation itself.Insect life history traitsLife history traits offer a simple predictive framework for determining which species will thrive (and which will decline) across environmental gradients and under alternative management interventions, and trait-based analyses can help contextualize generalized models. Multiple disparate efforts in different regions are currently compiling trait information for different taxa involved in biological control, from carabid beetles (http://www.carabids.org) to soil invertebrates (http://betsi.cesab.org). Thus, a major component of our data synthesis effort will be to work with researchers worldwide to unite these efforts to create one coherent trait database for crop pests and their natural enemies. We have already formed key partnerships with researchers in United States and in Europe that have begun to compile trait data relevant to many species involved in biological control (see Letters of Collaboration; Management Plan). By supplementing existing efforts with focused literature searches and consultation with expert entomologists in our research network, we will develop a complete trait database for all crop pests and (if sufficient data exists) their natural enemies that are represented in our living database for biocontrol. We will focus on traits that have been shown to be predictive in other systems and that are widely available for many taxa; for example, feeding specialization, habitat specialization, invasive status, alternate hosts, dispersal mode, body size, etc.Advances in Earth observations of vegetationThere are many possibilities for different characteristics of vegetation that can be linked with insect life history traits to more mechanistically represent how pests and natural enemies may be using the landscape. Our initial research questions frame a series of hypotheses linking landscape to traits (see Research Questions, in Objectives, above). We expect this set of questions to evolve along with the living database. However, to start, we will consider how traits mediate insect response to both the crop and natural or semi-natural habitat fragments in and around the cropped areas, in terms of productivity, phenology, diversity or heterogeneity, and, to the extent possible, composition or presence of key functional types of vegetation.Advances in Earth observations of vegetationThere are many possibilities for different characteristics of vegetation that can be linked with insect life history traits to more mechanistically represent how pests and natural enemies may be using the landscape. Our initial research questions frame a series of hypotheses linking landscape to traits (see Research Questions, in Objectives, above). We expect this set of questions to evolve along with the living database. However, to start, we will consider how traits mediate insect response to both the crop and natural or semi-natural habitat fragments in and around the cropped areas, in terms of productivity, phenology, diversity or heterogeneity, and, to the extent possible, composition or presence of key functional types of vegetation.Computational and data architectureWe are building a living database for biocontrol; we use the term "living database" to convey that the database will be able to adapt to new data sources, changes in base data platforms and design, and ingest new data sources as they arise. Updating databases has been constrained in the past by the often unanticipated costs of database management, such as the ingestion of new data sources, adaptation to the needs of the users of the database, and validation and data scrubbing. We will automate many of these tasks, to keep data maintenance costs low, and dedicate the InVEST platform architecture to maintain the software supporting this database in perpetuity.There are several phases and considerations in constructing a living database for initial ingestion of known data, support for scientists to submit data, automation of ingesting updates to remotely-sensed data, and ongoing normal maintenance of a data platform. Throughout the development of the database, there will be a data validation team (comprised of post-docs and graduate or advanced undergraduate students funded on this project) supervising the data uptake process and assessing the success of the data harmonization and quality control processes.

Progress 09/01/22 to 08/31/23

Outputs
Target Audience:Our target audience is the scientists and tool developers who work with growers, land managers, and other decision-makers, who are constrained in their ability to reach these broader audiences by limitations in the data necessary to build the science and knowledge to confront pest control challenges. It is this information gap they face that our project is proposing to fill. We emphasize this set of stakeholders is so important because they have the interest and experience to conduct streamlined analyses and/or build new software decision-support systems to inform improved pest management, and are limited only by the data. To this end we have engaged >50 insect ecologists, >20 remote-sensing scientists, and >10 data scientists, over the course of this year, based on their expertise, access to large datasets or products, and/or involvement in larger networks for their disciplines. These researchers have expanded our Coordinated Innovation Network, offering ideas on data sources and advice on how to approach new partners, as well as feedback on the functionalities we are building into our platform. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have employed one postdoctoral fellow at UC Davis throughout the duration of this project (Sara Emery), who has just completed her term and begun a new position as a professor at Cornell University. Her professional development included: 1. Research: Emery was trained in a rich array of methods and research strategies throughout the project, to explore pest ecology in agricultural landscapes, to apply Earth Observations data and techniques to pest occurrence data, and to learn software engineering approaches for translating her work into decision-support tools. In doing so, Emery has learned multiple new analytical techniques, from leveraging General Additive Mixed Models for complex ecoinformatic analyses to implementing Python scripts for processing and downloading Earth Observation data from Google Earth Engine. 2. Leadership and team management: Emery took on major leadership roles and gained key team management experience by coordinating the many moving parts of the project. 3. Communication: Emery was afforded opportunities and training to develop her written and oral scientific communication skills, including detailed feedback on manuscript writing and opportunities to deliver practice talks in small-group lab settings where lab members offered feedback on presentation design and speaking techniques. 4. Professional development: Shortly after arriving at UC Davis, Emery was encouraged to enroll in a popular career development class focused on developing job application materials and refining interview strategies. Following the course, co-PIs Karp and Rosenheim worked closely with Sara to develop and refine her faculty application materials (research, teaching, and diversity statements). Happily this resulted in a successful outcome, with Emery securing a position in Extension Faculty at Cornell University. We employed one PhD student (GSR) at UC Davis through the summer of 2023 (Mia Lippey), and she continues to collaborate with the team and the project through her dissertation. Her professional development has included: Research: Lippey used this time to make progress on her research goals that align with the objectives of this grant. In particular, over the summer, she completed her first chapter, which examines the effects of landscapes on agricultural pests in California. She is currently preparing the manuscript for this research paper and plans to submit over the next few months to Ecological Applications. (This grant will be in the acknowledgements). Additionally, Lippey has begun chapters 2 and 3 of her dissertation research, which will focus on 1) effects of warming climate on agricultural insect populations, and 2) if a trait (thermal performance) can explain these results. She spent a majority of the summer cleaning and preparing the data for these chapters, and has obtained results on two out of 30 species so far. Skills: Lippey has developed her research skills during the course of this GSR, specifically in the areas of data science and computation/statistics. She worked closely with Rich Sharp, a computer programmer/engineer to understand the process of cleaning and handling large quantities of data. Additionally, Mia worked closely with Daniel Paredes and PI Jay Rosenheim to learn statistical methods for analyzing big data. Lippey has also greatly improved skills in scientific communication and collaboration which are essential to big data research. She has presented her work (related to the grant) at ESA for four consecutive years, and has created a network of close collaborators. We have just hired a doctoral fellow in data science at University of Minnesota (Colleen Miller). She is just beginning (as of fall 2023), but her professional development aims include the following: Research Machine learning and AI adjacent analytical skills. Building upon previous experience using machine learning and experience/expertise in biostatistics. Developing and expanding coding and statistical skillset based on new types of statistical tests and measures, including decision tree based work Expanding research interests to encompass social information Integrating the human component of the global ecosystem into basic questions in ecology and agricultural entomology Collaborate with scholars focused on human information to better model and understand biological control of pests and pest pressure in agricultural settings Expanding experience in working with a variety of data types, including experimental data collected by a variety of individuals across a variety of environments Leadership, team management, and communication Engaging and networking with experts within sub-discipline Engaging with experts in the fields of entomology, remote sensing, biostatistics and ecology who are close collaborators or immediately adjacent to the project Engaging and networking with experts outside of sub-discipline Distinctly part of professional development goals to get permanent position in non-tenure track, non-academic system Engaging with experts in software, economics, private industry and non-academic institutions to broaden impact and relevance of work as a scientist Intertwine engagement with project goals (e.g., social data, different types of methodologies) How have the results been disseminated to communities of interest?We have collectively given over a dozen public presentations at various professional conferences, invited lectures, and seminars over this project period, describing the aims of the Living Database and what we have achieved so far. We have held two working groups alongside proefssional conferences as well as several virtual sessions beta-testing the functionality of the associated tools. We have also published 5 peer reviewed papers in respected journals (PNAS, Ecological Applications, etc.) utilizing Living Database data or insights, with 4 more on the way. What do you plan to do during the next reporting period to accomplish the goals?1) Expand the biocontrol database Although in the past reporting period we have been focused on utilizing the datasets acquired in the first years of the project, this next reporting period we will transition to acquiring a much larger number of datasets. We will do this in two ways: first, pushing our beta version of the Living Database out to our collaborators in the Coordinated Innovation Networkto solicit their own datasets, and second by pursuing connections those collaborators have in government and private sector. 2) Acquire relevant life history trait data In the next reporting period we will use the hypotheses we are testing in our current studies to drive the integration of traits databases into the Living Database, focused on the traits shown to be most important in our analyses. In particular we will be exploring the following: native/non-native, specialist/generalist, body size/mass, dispersal type. dispersal distance, overwintering habitat, species range, and temperature optima. 3) Acquire Earth observations (EO) data After much in-depth analysis of large datasets in a single region, we are now shifting gears to analyze a much broader set of EO variables across a border array of studies. At a global scale, few patterns that broadly predict whether the biological control of agricultural pests would be successful. By combining multiple data types, including climate data, pest trait data, and landscape data at a high resolution, we intend to predict in what contexts biological control of pests will be effective. Given the massive potential power of biological and remote sensing data distributed globally, we will apply a machine learning approach to the question of biological control of agricultural pests to answer, "which aspects of the environment and traits of the community members are most important to predicting success of biological control?" We will iteratively investigate this question at both the global and regional scale using the Living Database (including the original SESYNC database we started out with encompassing ~7000 observations across 132 studies in 31 countries). By investigating first which predictors of variation in biological control success are most relevant at meso and global scales, we will harness this database's ability to understand 'what matters' and eventually why and how biological control success varies. One new direction we intend to pursue with this broader set of studies and of EO variables globally is understanding how the history of a landscape may influence the success of biological control across different agricultural insect communities. The impact of landscape identity and variation on community turnover, trophic relationships and evolution is a long held sub-discipline in ecology. However, during the Anthropocene the rate and type of land use change has altered dramatically, with landscapes transforming in cover type, life history and productivity vastly over short timescales and at times, producing novel landscapes. Yet this topic is still understudied within agricultural insect community literature, particularly when applied to pest control. Understanding the stability or stochasticity of the broader environment over time and across many different farms and crops may help us understand the deeper context to pest management, specifically in the case of biological pest control. For example, locations throughout the globe have experienced varying degrees of stochasticity in their climate, which may vary, in addition to modern climate change itself. The landscapes may have also changed considerably at various locations with turnover in agricultural intensity, practice and landscape heterogeneity over a similar time period, while other areas have not. By introducing not only climate normal and anomaly data, but also landscape normals and anomalies, we intend to better represent the historical and evolutionary background of a given insect pest system. We will test whether the stability of a landscape influences the success of biological control of various types of agricultural insect pests using the Living Database. The history of the landscape and greater environment may have a bigger story to tell and help us understand where biocontrol of insect pests may work better, given both a spatial and temporal eco-evolutionary lens. 4) Develop the software infrastructure This next period we will start the implementation of the back-end database which houses data provided by users, to perform the following functions: Clean and upload data: a user has georeferenced pest or pest control dataset they want to contribute to the repository for others to be able to access and use Augment existing data: a user has existing point data and wants to use the living database to add remote-sensing variables to it without having to fetch those data themselves. Search for data: a user has a location/timeframe they want to search in for pest related data. Community: some uses suggest a coordination among users with shared interest To achieve this we will be developing the following pieces of architecture: Data Store - digital store to handle long-term raster and point data, a framework for data backup, and framework to differentiate between public and authenticated direct access. Back-End Database - design considerations for data schema to support an evolving data catalog, user accounts, custom data views, and regular back-end batch processing. Back-End REST API Framework - framework to couple backend functionality with front-end components. Provides access to the functionality described above. Front-End User Interface - component to provide user facing functional pathways.? 5) Encourage adoption of the living database across the research community Up until now we have been mostly consulting the researchers already participating in our Coordinated Innovation Network. This reporting period we aim to expand that network greatly, through professional meetings and virtual lab meetings. Our work thus far has demonstrated the value of the Living Database, to show researchers what it can offer them and why it's worth sharing their data so they can gain access to these functionalities. In the final year we will be returning to our Coordinated Innovation Network to solicit the 30 datasets they already identified that meet our criteria (>100 site-years, measuring either pest densities, damage, or predation/parasitism, georeferenced or with spatial locations known within ~1 km), and to ask them to further share the functionality of the platform in their research networks to solicit further datasets

Impacts
What was accomplished under these goals? OVERALL IMPACT: We completed in-depth analysis of two of the larger (>1,000 observations over > 10 years) datasets we acquired in Year 1, in California and in Spain. We finalized scripts for detecting errors in data entry to speed the data cleaning process necessary to prepare the data for analysis, and have implemented this in several datasets to great user satisfaction (saving "hours to days of prep time"). We also refined the computational pipeline we built in Years 1 and 2 for extracting Earth observation variables for each field observation, culminating in a beta web tool. We coordinated a larger working group between the insect ecology and remote-sensing communities at the Ecological Society of America to beta test the new tool for Earth observations variables extraction, identify new remote-sensing methods to integrate, and discuss issues with the storage and privacy of insect data to be hosted on this platform. This work is streamlining the data collection process with powerful tools that enable more rapid formation and testing of new scientific hypotheses regarding how insects respond to their surrounding landscape. Accomplishments under specific goals: 1)Expand the biocontrol database, seeking out large-scale, long-term datasets. This reporting period we undertook an in-depth analysis on the large datasets we acquired in the first years of the project to showcase the value of Earth observations to this type of analysis. In one example, that we have written up and is in its second round of revisions at Science of the Total Environment, we used 1487 field-year observations of Lygus hesperus (Western Tarnished Plant Bug) densities in California cotton fields to determine whether integrating remotely-sensed metrics of vegetation productivity and phenology into pest models could improve pest abundance analysis and prediction. Because L. hesperus often overwinters in non-crop vegetation, we predicted that pest abundances would peak on farms surrounded by more non-crop vegetation, especially when the non-crop vegetation is initially productive but then dries down early in the year, causing the pest to disperse into cotton fields during the early period of crop vulnerability. Aligning with our hypotheses, we found that Lygus densities were much higher on farms surrounded by more non-crop vegetation that is more productive (higher Enhanced Vegetation Index [EVI] area) and experiences dormancy earlier in the year. Specifically, models predicted Lygus densities were 15 times higher on farms surrounded by high versus low productivity non-crop vegetation (EVI area 293 vs. 57) and 2.5 times higher when dormancy occurred earlier versus later in the year (May 15 vs. June 30). Finally, we found that integrating these remote-sensing variables into landscape models provided significantly more accurate predictions of pest densities in cotton compared to models with categorical land cover metrics alone. Together, our work suggests that remote sensing variables can simultaneously advance our understanding of pest ecology and bolster the accuracy of pest abundance predictions. 2)Acquire relevant life history trait data matching major pests and natural enemies represented in the biocontrol database. We are continuing work on thermal traits, specifically to: 1) characterize how insect populations are responding to warmer temperatures, and 2) see if species-specific thermal optima explain these insect population responses to warmer temperatures. Our graduate student presented (at the Entomological Society of America this fall) on her preliminary results for two species in the RAIF database: Olive fly and olive moth. Results were opposite of eachother - olive moth performs better with heat, abundances increasing with warmer temps (positive, very significant slope), while olive fly prefers cooler environments and abundances decrease with warmer temps (negative, very significant slope). We chose these two species first because they were already cleaned, but unfortunately, they don't have the thermal performance data I'm looking for that would allow me to analyze them for the trait based responses. But now that the data-cleaning scripts are complete it is far easier to clean these datasets. The next species we will analyze (Chrysoperla carnea) has thermal performance data, so we will have some results for these soon. So far, we have acquired 42 species with thermal performance data that can be included in the trait-based analysis. 3)Acquire Earth observations (EO) data for georeferenced locations of insect data. We have obtained EO data on climate, vegetation, and other environmental variables (and more importantly the software infrastructure to extract them at any geolocations, see #4). For climate we are including growing season temperature variation and precipitation variation, max/min and cumulative precipitation over the growing season, minimum annual temperature, first thaw, final frost, dewpoint, snow cover, and wind speed. For vegetation we are including vegetation indices of greenness (EVI and NDVI), MODIS phenological variables, Dynamic Habitat Indices (as a measure of ecosystem-level diversity), GPP (Gross Primary Production) and NPP (Net Primary Production) that may better indicate productivity in terms of plant physiological function; LAI (Leaf-Area Index) which indicates the horizontal surface of the resource on which dynamics between pests and their predators play out; FPAR (Fraction of Photosynthetically Active Radiation), the variability of which has been used as a proxy for agricultural intensity or disturbance, and rumple texture as a measure of landscape diversity or heterogeneity. For other environmental variables we are including night lights, surface water distance, soil moisture, wildland-urban interface, slope, orientation, altitude/elevation. 4)Develop the software infrastructure to automate tasks involved in the above goals. EO extraction pipeline: During this reporting period we worked with our Coordinated Innovation Network to refine our framework for extracting Earth observation variables from the Google Earth Engine to allow for a broader set of variables to be included (see #3 above) and for specification of different time windows (rather than just annual variables). Available in the ee_sampler.py and gee_point_sampler.py scripts in https://github.com/springinnovate/pestcontrol_livingdatabase Data cleaning pipeline: The clean_table.py script in our pestcontrol_livingdatabase repository generates tables with unique values for a given field ranking similarity of variables names that the user can adjust and then apply to the dataset to clean all typos in the replace_in_table.py script. Data Viewer. Our data viewer for the EO data in the Google Earth Engine to be loaded alongside sample points for inspection, alongside additional tools to allow for point picking and "wiping" between two raster datasets for comparison. Web app.This portal removes the details of the computational environment, data formats, and provides an user interface more familiar to researchers to perform the EO extraction. 5)Encourage adoption of the living database across the research community. In addition to the >20 public presentations we have given thus far describing the aims of the Living Database and what we have achieved so far, including the special session and working group organized at ESA this past summer, we have continued conversations with the insect ecologists, remote-sensing specialists, and data scientists within our Coordinated Innovation Network. We continue to hold weekly calls with our core team, and several members of our broader network beta-tested and provided feedback on various aspects of the Living Database to continue to refine it for greater adoption by a broader audience

Publications

  • Type: Journal Articles Status: Under Review Year Published: 2024 Citation: Emery, S.E., J. Rosenheim, R. Chaplin-Kramer, R. Sharp, D. Karp. In revision. Leveraging satellite observations to predict agricultural pest densities and reveal ecological drivers of variation across landscapes. Science of the Total Environment.
  • Type: Theses/Dissertations Status: Other Year Published: 2024 Citation: Lippey, M.K. J. Rosenheim, E. Meineke, D. Karp, D. Paredes, S. Emery, R. Chaplin-Kramer, R. Sharp, M. Di Genova. In prep. One landscape does not fit all: Complex arthropod responses to surrounding land use in California citrus. Ecological Applications.
  • Type: Journal Articles Status: Other Year Published: 2024 Citation: Related efforts (collaborators in the Coordinated Innovation Network who were not funded by this project but whose work was enabled by the Living Database): Perrot Thomas, Beaumelle L�a, Thomine Eva, Desneux Nicolas, Tricault Yann, Albrecht Matthias, Bat�ry P�ter, Bianchi Felix, Birkhofer Klaus, Bosem Baillod Aliette, Chaplin-Kramer Becky, Cortesero Anne-Marie, Costamagna Alejandro, Diekoetter Tim, Ekroos Johan, Gagic Vesna, Huseth Anders, Jankovic Marina, Karp, Daniel S., Krauss Jochen, Lavandero Blas, Le Ralec Anne, Lu Yanhui, Mitchell Matthew, Molina Gonzalo AR, Palmu Erkki, Perovic David J., Martin Emily A., Redlich Sarah, Saulais Julia, Schmidt-Entling Martin H., Schneider Louis, Steffan-Dewenter Ingolf, Sutter Louis, Tschumi Matthias, Rusch Adrien. In prep. Global analysis reveals that crop diversification increases pest control services only in landscapes with significant amount of semi-natural habitats.
  • Type: Journal Articles Status: Other Year Published: 2024 Citation: Related efforts (collaborators in the Coordinated Innovation Network who were not funded by this project but whose work was enabled by the Living Database): K. Poveda, D.S. Karp, R. Chaplin-Kramer, M. Centrella, T. Luttermoser, R. Perez-Alvarez6, M. ORourke, E.A. Martin & H. Grab. In prep. The importance of landscape composition for crop yield: a global quantitative synthesis.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: We hosted a special session at the Ecological Society of America meeting in August 2023 (New approaches to age-old problems: data science for pest ecology at the landscape scale), which highlighted advances, insights gained, and emerging frontiers in the landscape ecology of insect dynamics made possible through ecoinformatics, remote-sensing, machine-learning, and other data science approaches. Our featured speakers came from universities, government agencies, NGOs, and the private sector, which helped forge new transdisciplinary collaborations in pest control ecology. We gave the first talk in this session, entitled Integrating field ecology, remote-sensing, and data science to build a living database for biocontrol (Chaplin-Kramer, Emery, Karp, Rosenheim, and Sharp). Sara Emery also gave a talk in a separate session: Can remote sensing explain variation in functionally-similar pest densities across space?(Emery, Rosenheim, Chaplin-Kramer and Karp).
  • Type: Other Status: Published Year Published: 2023 Citation: Sara Emery also gave several seminars on the project (entitled Big data, conservation biological control, and precision ecology) at various universities around the world: UC Riverside, February 6 2023 Penn State, Feb 13 2023 University of Melbourne, Feb 21 2023 Leiden University, Feb 28 2023 Cornell University, May 8 2023


Progress 09/01/21 to 08/31/22

Outputs
Target Audience:Our target audience is the scientists and tool developers who work with growers, land managers, and other decision-makers, who are constrained in their ability to reach these broader audiences by limitations in the data necessary to build the science and knowledge to confront pest control challenges. It is this information gap they face that our project is proposing to fill. We emphasize this set of stakeholders is so important because they have the interest and experience to conduct streamlined analyses and/or build new software decision-support systems to inform improved pest management, and are limited only by the data. To this end we have engaged 20 insect ecologists, 10 remote-sensing scientists, and 8 data scientists, over the course of this year, based on their expertise, access to large datasets or products, and/or involvement in larger networks for their disciplines. These researchers have served as our first-wave participants in our Coordinated Innovation Network, offering ideas on data sources and advice on how to approach new partners, as well as feedback on the functionalities we are building into our platform. Changes/Problems:We are planning to hire a data science postdoc at the University of Minnesota in this reporting period, who will work on the project for the duration of the grant. Our original plan had been to hire a remote-sensing postdoc early on in the project, but between our own staff remote-sensing specialists, the remote-sensing expertise within our software team at subawardee Spring, and the valuable advice and guidance provided by the remote-sensing collaborators in the Coordinated Innovation Network, we did not wind up needing a postdoc with that speciality. Instead, we have determined that a data scientist to work more closely with the software team at Spring would be useful to bringing the web portal into reality, in terms of visualization and user interface considerations, as well as proposing and testing different analytic capabilities within the portal. This postdoc will also place more of a focus on soliciting additional datasets than we have to date, and will lead a "data paper" that all data contributors will participate in. Our current postdoc at UC Davis will be winding down at the end of this reporting period, so they will overlap for several months before entering the final phase of the project that aims to professionalize and publicly launch the Living Database. What opportunities for training and professional development has the project provided?We have employed one postdoctoral fellow throughout the duration of this project (Sara Emery), who has gained key skills and developed professionally in at least four areas. 1. Research: Emery has been trained in a rich array of methods and research strategies throughout the project. Most notably, she has worked closely with our highly transdisciplinary team to design and execute both basic and applied research. Specifically, Emery has worked with co-PIs Karp and Rosenheim to explore pest ecology in agricultural landscapes, PI Chaplin-Kramer to apply Earth Observations data and techniques to pest occurrence data, and Collaborator Sharp to learn software engineering approaches for translating her work into decision-support tools. In doing so, Emery has learned multiple new analytical techniques, from leveraging General Additive Mixed Models for complex ecoinformatic analyses to implementing Python scripts for processing and downloading Earth Observation data from Google Earth Engine. 2. Leadership and team management: Emery has taken on major leadership roles and gained key team management experience by coordinating the many moving parts of the project. For example, she has helped build and maintain relationships with our coordinated research network, taking a leading role in reaching out to and regularly communicating with new data providers. She has also helped organize and run our weekly to biweekly project meetings. 3. Communication: Emery has been afforded opportunities and training to develop her written and oral scientific communication skills. For example, Karp and Rosenheim (her primary research mentors) are working closely with her on writing her first manuscript from this project, and, in doing so, exchanging ideas about writing techniques (e.g., effective outlining practices). Emery has also been afforded opportunities to present her work to fellow scientists, for example, in the UC Davis Department of Wildlife, Fish, and Biology Seminar Series and at the Ecological Society of American annual conference. Prior to these events, Emery delivered practice talks in small-group lab settings where lab members offered feedback on presentation design and speaking techniques. 4. Professional development: Shortly after arriving at UC Davis, Emery was encouraged to enroll in a popular career development class, offered to UC Davis ecology graduate students and postdocs, focused on developing job application materials and refining interview strategies. Following the course, co-PIs Karp and Rosenheim worked closely with Sara to develop and refine her faculty application materials (research, teaching, and diversity statements). After Emery progressed to faculty interview stages, Karp and Rosenheim then worked with her to develop her first job talk, offering both individual feedback and providing space in lab meetings for her to practice and receive feedback from a larger group of students, postdocs, and faculty. How have the results been disseminated to communities of interest?We have collectively given 9 public presentations at various professional conferences, invited lectures, and seminars over this project period, describing the aims of the Living Database and what we have achieved so far. We have also published 5 peer reviewed papers in respected journals (PNAS, Ecological Applications, etc.) utilizing Living Database data or insights. What do you plan to do during the next reporting period to accomplish the goals? 1) Expand the biocontrol database Although in the past reporting period we have been focused on utilizing the datasets acquired in the first year of the project, this next reporting period we will transition to acquiring a much larger number of datasets. We will do this in two ways: first, pushing our beta version of the Living Database out to our collaborators in the Coordinated Innovation Network (once the web portal is built; see #4) to solicit their own datasets, and second by pursuing connections those collaborators have in government and private sector. We anticipate that gaining cooperation from such partners will be facilitated by certain analytical capabilities that we may include in the web portal, based on user feedback, such as generating maps of predicted pest abundances. Through consultation with our network we have determined that privacy remains a major issue for many of these partners and we will work to demonstrate how the geolocations can be used to produce predictor variables from Earth observations but then can be abstracted away from public access in the Living Database. We also hope to approach the USDA this reporting period in the same way, and hope to set up a meeting with program officers to demonstrate the capabilities of the Living Database and explore possibilities within datasets held by, for example, the Risk Management Agency. 2) Acquire relevant life history trait data Our focus on life history traits thus far has been hypothesis driven and mostly outside of the Living Database platform (e.g., thermal tolerances linked to field measurements of temperature; modeled population trajectories of generalists with different assumptions of overwintering and resource use). In the next reporting period we will use these hypotheses to drive the integration of traits databases into the Living Database, focused on these traits shown to be most important in current study. For example, pest control by generalist predators appears to be most strongly influenced by the presence of alternate prey in the crop before colonization by pests occurs. We are exploring the use of the Living Database to test how well theoretical models can predict differences we see between sites /geographies in the real world, or even between different years at one particular site. By initializing the model with the insect densities at colonization each year (based on field data), the growing degree days in that year (based on EO-derived temperature), and leaf-area index across the growing season (also based on EO), we can test how well modeled densities match observed densities in the field. We are already aware of several trait databases pull data from, including some focused specifically on invertebrates (carabid beetles, http://www.carabids.org; soil invertebrates, http://betsi.cesab.org; spiders, https://spidertraits.sci.muni.cz; ants, http://globalants.org), as well as broader datasets on traits of interest such as thermal tolerances (https://datadryad.org/stash/dataset/doi:10.5061/dryad.1cv08), body size and metabolism (https://github.com/animaltraits/animaltraits.github.io), and inter-species interactions (https://biotraits.ucla.edu/index.php). In the coming year we will also solicit traits data from collaborators in our Coordinated Innovation Network, and we develop an API to connect with the Open Traits portal (https://opentraits.org) to facilitate discovery of additional datasets, ensuring that any traits data we acquire conform to their standards. 3) Acquire Earth observations (EO) data In the next project period we will be working with remote-sensing collaborators at University of Michigan in our Coordinated Innovation Network to explore the development of new datasets that may provide important information for pest control modeling, derived from raw EO imagery from Landsat and Sentinel. At our workshop at the ESA meeting, we identified crop-type and crop-management mapping from random-forest modeling as one of the most promising paths forward for bringing innovations in remote-sensing into Living Database. We also discussed hyperspectral data, and determined that the data are not yet widely enough available to be useful for the datasets we have in hand, for the long time periods over which we have acquired insect data (dating back to the 1990s). Therefore we are pursuing random forest modeling with our collaborators who have developed a Google Earth Engine script that trains and applies a random forest classifier using Sentinel-2 imagery: https://code.earthengine.google.com/a02f236618bbc894c97dde458b3e908f We plan to adapt this code to also utilize Landsat imagery, for which there is a longer record that coincides with our pest control data. This code can be used to identify different crop types and other features (including management practices) from ground-based observations that can serve as training data. This method has been used to identify tillage practices across India's main grain producing region, the Indo-Gangetic Plains (Zhou et al. Remote Sens. 2021 DOI:10.3390/rs13245108), and the US Corn Belt (Azzari et al. Remote Sens. Env. 2019 DOI: 10.1016/j.rse.2018.11.010), and has been found to be fairly generalizable using both Sentinel-2 and Landsat data. 4) Develop the software infrastructure We have determined through our beta testing with collaborators in our coordinated innovation network that most users find interacting with command line Python scripts to be a barrier to interfacing with our EO sampling framework. While some users found these tools to be helpful, most users encountered issues related to setting up computational environments, unexpected interactions between different geographic Unicode schemes, and a general unfamiliarity in using command line software as a research tool. To alleviate these issues, we will develop a web portal in the next reporting period that will remove the details of the computational environment, data formats, and provide an interface more familiar to researchers. Initial prototypes of the web portal will allow users to upload a CSV of field sample data (at least lat/lng and species name) and receive a CSV with relevant EO and taxonomy data based on the intersectionality of spatial lat/lng and semantic species name. We expect to rapidly iterate on new features and functionality throughout the reporting period guided by user feedback, such as basic analytical tools (e.g., https://scikit-learn.org/) that could allow the user to produce a raster map of predicted densities based on their field observations at point locations and the predictor variables in the Living Database. In addition to web portal development, we will start the implementation of the back-end database which houses data provided by users. Since it is very difficult to predict all the types of features/metadata provided by users, we will design the database schema to support the ability to add an indefinite number of new features to field sites without incurring large overhead to the database. 5) Encourage adoption of the living database across the research community Up until now we have been mostly consulting the researchers already participating in our Coordinated Innovation Network. This reporting period we aim to expand that network greatly, through professional meetings and virtual lab meetings. We have applied to host a session at next summer's Ecological Society of America (ESA) meeting and we will also be applying in April to host a session at next winter's American Geophysicist's Union (AGU). If accepted, our session at the ESA meeting in Portland next August will feature speakers from universities, government agencies, NGOs, and the private sector, and this session would kick off a 1-day workshop afterwards to dive into the Living Database and forge new transdisciplinary collaborations in pest control ecology

Impacts
What was accomplished under these goals? 1) Expand the biocontrol database We have begun active analysis on the large datasets we acquired in the first year of the project [50,000 site-years with multiple observations per site per year (including pest densities, infestation rates, and crop damage over the growing season) from a national dataset in Spain spanning 8 crops over 13 years; a combined >3200 site years in California (>1200 over 11 years in citrus and ~3000 over 18 years in cotton); and >600 in cotton in Australia over 5 years] as well as the SESYNC database we started out with (encompassing ~7000 observations across 132 studies in 31 countries). A necessary next step before acquiring additional datasets was to demonstrate to researchers what the Living Database can offer them and why it's worth sharing their data so they can gain access to these functionalities (see #s3 and 4). Now that we can demonstrate the value of the platform we will be returning to our Coordinated Innovation Network to solicit the 30 datasets they already identified that meet our criteria (>100 site-years, measuring either pest densities, damage, or predation/parasitism, georeferenced or with spatial locations known within ~1 km), and to ask them to further share the functionality of the platform in their research networks to solicit further datasets. 2) Acquire relevant life history trait data We focused on one main life history trait in this project period: thermal tolerance. We are gathering data on thermal performance curves (looking at how developmental success, development rate, and lifetime reproductive output vary with changing temperature) for a suite of insect pests associated with the RAIF datasets as well as for the citrus and cotton datasets described above. We are focusing on this trait first because agricultural pest survival and reproduction in response to a wide range of temperatures are routinely studied in laboratories, which means ecological and physiological data are available for a diversity of agricultural pests (including the >150 species represented in the dozen different crops comprising our current Living Database). Collaborators in our coordinated innovation network are exploring other traits through theoretical modeling: overwintering habit, specialist vs. generalist, floral resource use. Their modeling is setting up hypotheses for us to test through integration of traits data in our Living Database. 3) Acquire Earth observations (EO) data We have obtained EO data on climate, vegetation and land cover for the geolocations of our insect datasets thus far (and importantly, developed the pipeline to replicate for any new geolocations; see #4). - For climate we report the cumulative, mean, and variance over any user-specified intervals, up to the date ranges included in the insect sampling data, based on daily temperature and precipitation data from CHIRPS (Climate Hazards Group InfraRed Precipitation With Station; at 5km resolution), and from ERA5 (Latest Climate Reanalysis Produced by ECMWF / Copernicus Climate Change Service; at 30 km). For US-based sites, the user can also select DAYMET data at 1 km resolution. For vegetation we are using Landsat-based (30 m) Vegetation Indices (EVI, NDVI, measures of "greenness"), for the sampling date or summarized over the growing season; MODIS-based (500 m) phenological variables such as greenup, mid-greenup, peak, maturity, mid-greendown, senescence, and dormancy; GPP (Gross Primary Production) and NPP (Net Primary Production) that may better indicate productivity in terms of plant physiological function rather than merely greenness; LAI (Leaf-Area Index) which indicates the horizontal surface of the resource on which dynamics between pests and their predators play out; and FPAR (Fraction of Photosynthetically Active Radiation), the variability of which has been used as a proxy for agricultural intensity or disturbance. For all of these variables (climate and vegetation) we summarize over a user-defined buffer size (with current applications running as far as 30 km), for crop- and non-crop pixels within that buffer, using a selection of land cover products that the user can choose from. For US sites the user can select either the USGS National Land Cover Dataset (at 30m; available for 2001, 2004, 2006, 2008, 2011, 2013, 2016, and 2019) or NASS Cropland Data Layer (also at 30m; available for 19997-2021 for specific crops) and for European sites the user can select the CORINE Land Cover Dataset (at 100 m; available for 1989 to 1998, 1999 to 2001, 2005 to 2007, 2011 to 2012, and 2017 to 2018). For all locations globally, the user can select from MODIS land cover (at 500 m, available annually from 2001-2020), GFSAD1000 cropland extent (at 1km, a highly accurate Multi-Study Crop Mask for the year 2010), or Dynamic World (at 10 m, available at near-real time, at daily intervals, from 2015). 4) Develop the software infrastructure to automate tasks involved in the above goals EO extraction pipeline: During this reporting period we refined our framework for extracting Earth observation variables from the Google Earth Engine based on use cases from project staff to develop several new features, including: 1) a point-sampling technique to separate out user defined sample points into "training" and "holdback" datasets such that the holdback datasets are spatially co-located, rather than randomly sampled, which is critical for modern development of machine learning models to prove that a model is not overtrained; 2) an extensible framework allow for additions of datasets of land-cover based masks and other EO data to be summarized by different land cover types, with hooks to allow for additional types of datasets to be cleanly integrated with the pipeline through .INI files; 3) extension in pipeline to report the EO variables described above in a user defined buffer rather than a fixed area; 4) improvements to run-time-- previous implementation of the sampling pipeline would take minutes to hours to extract EO based variables, we have optimized this pipeline using batch queries to increase response times by two orders of magnitude (i.e. seconds instead of minutes). Prototype Data Viewer. A standard research query for field sites often involves hundreds of points to be cross-referenced against EO data which in turn can have further refinements based on landcover classification or user defined areas. The result of these analyses is a table that can have dozens of columns along with the hundreds of rows of data. Generally these data are use to build regression models, but researchers often want to build an intuition or confidence in the data which they are building models against. We have built a prototype data viewer for these data in the Google Earth Engine so that a researcher can load sample points and overlay any of the data layers provided by our sampling pipeline. We include additional tools to allow for point picking and "wiping" between two raster datasets for comparison. While this tool is in an early prototype stage it is being actively used by researchers in their data analysis. 5) Encourage adoption of the living database across the research community In addition to the workshop organized at ESA this past summer, we have continued conversations with the insect ecologists, remote-sensing specialists, and data scientists within our Coordinated Innovation Network. We held weekly calls with our core team, and on a monthly or more frequent basis rotating between different members of the network as different issues related to insect data and modeling, remote-sensing, and data science arose. Several members of our core team and network participated in beta-testing various aspects of the Living Database, and provided feedback, which has led to a sound design for our Living Database that we will be building a web interface for this year to encourage further adoption by a broader audience

Publications

  • Type: Journal Articles Status: Accepted Year Published: 2023 Citation: Rosenheim, J. A., E. Cluff, M. K. Lippey, B. N. Cass, D. Paredes, S. Parsa, D. S. Karp, and R. Chaplin-Kramer. 2022. Reply to Marini et al.: Insect spill-over is a double-edged sword in agriculture. Proceedings of the National Academy of Sciences (in press)
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Rosenheim JA, Cluff E, Lippey MK, Cass BN, Paredes D, Parsa S, Karp DS, Chaplin-Kramer R. Increasing crop field size does not consistently exacerbate insect pest problems. Proceedings of the National Academy of Sciences. 2022 Sep 13;119(37):e2208813119.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Alexandridis N, Marion G, Chaplin?Kramer R, Dainese M, Ekroos J, Grab H, Jonsson M, Karp DS, Meyer C, O'Rourke ME, Pontarp M. Archetype models upscale understanding of natural pest control response to land?use change. Ecological Applications. 2022 Dec;32(8):e2696.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Paredes, D., Rosenheim, J.A. and Karp, D.S., 2022. The causes and consequences of pest population variability in agricultural landscapes. Ecological Applications, p.e2607.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Alexandridis N, Marion G, Chaplin-Kramer R, Dainese M, Ekroos J, Grab H, Jonsson M, Karp DS, Meyer C, O'Rourke ME, Pontarp M. Models of natural pest control: Towards predictions across agricultural landscapes. Biological control. 2021 Nov 1;163:104761.


Progress 09/01/20 to 08/31/21

Outputs
Target Audience:Our target audience is the scientists and tool developers who work with growers, land managers, and other decision-makers, who are constrained in their ability to reach these broader audiences by limitations in the data necessary to build the science and knowledge to confront pest control challenges. It is this information gap they face that our project is proposing to fill. We emphasize this set of stakeholders is so important because they have the interest and experience to conduct streamlined analyses and/or build new software decision-support systems to inform improved pest management, and are limited only by the data. To this end we have engaged 20 insect ecologists, 5 remote-sensing scientists, and 5 data scientistsover the course of this year, selected based on their expertise, access to large datasets or products, and/or involvement in larger networks for their disciplines. These researchers have served as our first-wave participants in our Coordinated Innovation Network, offering ideas on data sources and advice on how to approach new partners, as well as feedback on the functionalities we are building into our platform. Changes/Problems:As already communicated to our program officer, we needed to change our contract for the software engineering of our platform to a sub-award. At the time that we wrote our proposal, our Software Architect of the Living Database, Richard Sharp (who wrote the Cyber-Infrastructure sections of the proposal and is leading all of that work), was in transition from his previous role at the Natural Capital Project at Stanford University to his current role as Director of Engineering at the tech non-profit SPRING. We included him as an independent contractor because SPRING did not yet legally exist, and by University of Minnesota policy that meant his scope of work was automatically classified as a "contract". However, now that Sharp is affiliated at SPRING he wanted to bring this grant to that organization, not administer it individually as a private contractor. As per University of Minnesota policy, individuals who are major intellectual contributors to the project, that are based at organizations outside of the University (not independent contractors), are classified as "sub-recipients" not "contracts". This change delayed the initiation of that sub-award, and hence the spending on that aspect of the budget. However, Sharp has proceeded in his work on the projectin good faith, and now that the sub-award has been made, he will be invoicing for past effort soon. This administrative delay has not slowed our work or our progress toward project deliverables and will not impact spending from here on out. Other delays inspending havebeen due to staffing challenges during COVID. We were eager for the insect postdoc to spend their time on the project in person with project PIs at UC Davis, and we were not comfortable asking anyone to relocate during the pandemic, so we delayed our hiring process until this past summer. The postdoc has begun as of this fall, but the spending will not show up on this reporting period. Likewise, we paused our search for a remote-sensing postdoc at the University of Minnesota, and in the process of that delay we began working with a remote-sensing specialist at the Natural Capital Project previously based at Stanford. Because we had already been working with her for several years and were comfortable working remotely, we hired her on a temporary/casual status at the University of Minnesota while she finished out her work at Stanford. She is now full-time at Minnesota, but shared with other projects within the Institute on the Environment. We believe this provides an advantage to the project, because having her employed part-time rather than full-time will enable us to keep her for the duration of the grant. This delays our up-front spending on what would have been a postdoc in the first two years for this role, but the same amount will be spent over the 4 years of the project. What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Co-PI Karp gave the keynote at the Nekudat Hen agroecology conference in Israel, attended by 175 people, including scientists, growers, and industry representatives. What do you plan to do during the next reporting period to accomplish the goals?We have proceeded slightly out of order from our originally envisioned tasks. We are ahead of schedule on acquiring and processing of EO data and automating the extraction of EO variables, but behind on synthesizing and filling gaps in traits data (see Changes/Problems). We have not yet held in-person meetings for our Coordinated Innovation Network due to the pandemic, but as previously noted we find our virtual format of more frequent exchanges with a smaller number of collaborators to be extremely productive. Furthermore, we are already testing and iterating on the code base that forms the foundation for the living database, and will continue to do this as we proceed. On the whole, we don't foresee needing to make many adjustments to remain on track. In the next reporting period, we will focus on incorporating the traits data into our platform, and will begin preliminary analyses with datasets already in hand. We will continue to expand the insect and remote-sensing datasets with the help of our Coordinated Innovation Network. We will begin to build a web interface for the data cleaning and EO data extraction scripts to enable a broader user base, while continuing to test and adapt functionality in the python code base with advanced users. We will also expand our communication of early products of the platform, in presentations and the development of outreach materials to disseminate to broader audiences.

Impacts
What was accomplished under these goals? IMPACT: The lack of a definitive answer to the question of how habitat around farmland impacts pests after two decades of research suggests a dramatically new approach is needed to understand spatial dynamics of insect populations. To address this challenge, we are building an open-source, standardized data platform for pest control analysis and prediction. In this first year of the project, we have begun amassing much larger, multi-year datasets for analysis (adding >55,000 observations, more than quadrupling the previous size of the database), built a replicable computational pipeline for extracting Earth observation variables for each field observation, and mobilized the insect ecology and remote-sensing communities to identify new potential data sources to further expand this living database. This work is streamlining the data collection process with powerful tools that enable more rapid formation and testing of new scientific hypotheses regarding how insects respond to their surrounding landscape. Accomplishments under specific goals: 1)Expand the biocontrol database, seeking out large-scale, long-term datasets. Through consultation with the insect ecology experts in our Coordinated Innovation Network, we identified over 30 datasets that meet our criteria (>100 site-years, measuring either pest densities, damage, or predation/parasitism, georeferenced or with spatial locations known within ~1 km). We have obtained three of these datasets so far: >50,000 site-years with multiple observations per site per year (including pest densities, infestation rates, and crop damage over the growing season) from a national dataset in Spain, spanning 8 crops over 13 years; a combined >3200 site years in California (>1200 over 11 years in citrus and ~3000 over 18 years in cotton); and >600 in cotton in Australia over 5 years. We have already begun to explore the largest of these datasets, obtained from RAIF in Spain, which resulted in a publication in Ecology Letters this year (Paredes et al. 2021) demonstrating how certain landscape effects can be masked by year to year variation, and only through analysis of long-term datasets can the importance of landscape variables (in this case, productivity) to pest control be revealed. 2)Acquire relevant life history trait data matching major pests and natural enemies represented in the biocontrol database. We have not progressed on this objective in the first year because our hiring of a postdoc was delayed due to COVID (see "Changes/Problems"). We were able recruit and hire someone over the summer, and she has started as of this fall (but after this reporting period). One of her main roles will be exploring traits databases and testing hypotheses related to how insect traits mediate insect response to landscape variables, and one of her preliminary interests is thermal tolerances. 3)Acquire Earth observations (EO) data for georeferenced locations of insect data. We developed a pipeline for extracting Earth observations hosted on Google Earth Engine (GEE) for sampling locations, and we used it to acquire a preliminary set of EO variables related to productivity for all sites in the original SESYNC dataset as well as in the new datasets described in Objective 1. This preliminary set of variables includes: Landsat-based (30 m) EVI (Enhanced Vegetation Index, a measure of "greenness" more suitable for the tropics) or Normalized Difference Vegetation Index, a measure of "greenness" more suitable for temperate regions), for the sampling date or summarized over the growing season in terms of mean, minimum, maximum, and standard deviation; and MODIS-based (500 m) phenological variables such as greenup, mid-greenup, peak, maturity, mid-greendown, senescence, and dormancy. We provide all of these variables for measurement year and year prior to measurement, at a range of buffer sizes (from 100 m - 10 km), and for crop- and non-crop pixels within that buffer (using the National Land Cover Dataset and CORINE Land Cover Dataset for the US and European sites, respectively). We have also identified additional products hosted on GEE, such as GPP (Gross Primary Production) and/or NPP (Net Primary Production) that may better indicate productivity in terms of plant physiological functionrather than merely greenness, and LAI (Leaf-Area Index) and/or FPAR (Fraction of Photosynthetically Active Radiation), the variability of which has been used as a proxy for agricultural intensity or disturbance. Through initial consultation with remote-sensing experts in our Coordinated Innovation Network, we have also identified promising new datasets to explore for the US-based sites in the next year, including hyperspectral data (to explore spectral diversity as a proxy for plant diversity) from AVARIS and a related tool for processing those data, ImgSPEC. 4)Develop the software infrastructure to automate tasks involved in the above goals. We have developed prototype code within python to undertake common data cleaning tasks in Objective 1 and streamline GEE extraction of different EO variables in Objective 3. One of the most common errors that need to be cleaned in large datasets are typos in variables like farm name or technician name, which are important to control for as random effects in analyses. By combining minimum edit distance algorithms with a geographic referencing system, we are able to groupsimilar names occurring at the same or nearbysites, automatingthis error detection step in data cleaning, and cutting the processing time down from days or weeks to minutes. To streamline the EO data extraction, we use an API connecting python to GEE, to circumvent many of the limitations within the GEE framework (including caps on the size of the dataset and the number of variables that can be exported). Now all desired variables for all sites can be exported for any desired buffers, matched to year of measurement and to land cover dataset based on bounding box of the site geocoordinates. We have also added some data science experts to Coordinated Innovation Network, including from the IPBES Task Force on Knowledge and Data and from Google X's data team, who are interested in collaborating on our datasets with some of their own machine learning models. 5)Encourage adoption of the living database across research community. We are just beginning to have working prototypes that we can share with the research community, but we are first piloting them within our core team. Our main accomplishment in this objective has been the extended discussions we've had with the insect ecology experts and remote-sensing experts in the Coordinated Innovation Network, which helped us to identify new datasets and to sharpen the value proposition of what our platform could provide to new partners. Through these conversations, we developed a 1-page summary for potential data providers that we will begin disseminating through our network next year. We were unable to convene our entire Network in person this year due to the pandemic but we believe these more intimate conversations to be more helpful at this early stage of the project.

Publications

  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: Alexandridis, N; Marion, G; Ekroos, J; Pontarp, M; Poppenborg Martin, E A.; Chaplin-Kramer, R; Dainese, M; Grab, H; Jonsson, N; Karp, D, et al. (In Review). Archetype models for upscaling understanding of natural pest control response to land-use change. Ecological Applications.
  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: Paredes, D., J.A. Rosenheim, and D.S. Karp. (In Review). The causes and consequences of pest population stability in agricultural landscapes. Ecological Applications.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Paredes, D., Rosenheim, J. A., Chaplin?Kramer, R., Winter, S., & Karp, D. S. (2021). Landscape simplification increases vineyard pest outbreaks and insecticide use. Ecology Letters 24(1): 73-83.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Alexandridis, N., Marion, G., Chaplin-Kramer, R., Dainese, M., Ekroos, J., Grab, H., Jonsson, M., Karp, D.S., et al. (2021). Models of natural pest control: Towards predictions across agricultural landscapes. Biological Control: 104761.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Co-PI Karp gave the keynote at the Nekudat Hen agroecology conference in Israel, attended by 175 people, including scientists, growers, and industry representatives.