Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695
Performing Department
(N/A)
Non Technical Summary
The spread of invasive plantpests and pathogens (hereafter pests) are well-known ecological and economic threats to agriculture, responsible for 10-40% of crop yield losses globally and resulting in an estimated $40 billion of production losses each year in the United States. The threat pests pose to food security is expected to increase due to climate change and the global nature of trade and travel. Although many pests are under regulatory control to prevent and mitigate outbreaks, control or eradication after pest establishment can be resource-intensive, and success requires rapid detection and effective implementation of appropriate strategies. Even preventative pesticide measures can be costly and environmentally damaging if over-applied and ineffective if under-applied or incorrectly timed, all with the potential for promoting pesticide-resistanceand loss of natural controls.Rapid responses and data-driven decision support tools are essential then for understanding and mitigating threats posed by damaging agricultural pests. However, sparse data typically limit the accuracyand iterative improvement of pest spread models. This research will couple advances in object detectionusing machine learning withwidely-available crowdsourced and satellite imagery to build an automated, repeatable process for expanding mapping efforts ofsusceptible crops (i.e., host species)essential to forecasting pest spread accurately and across multiple scales. The resulting maps of host species will improve pest spread models by addressing sparse data concerns and reducing delays in data availability, thereby enabling the continuous improvement of pest spread forecastsand shortening time to decision making.We will focus on several economically and culturally significant fruit and tree nut species threatened by emerging pests and climate change. We will collaborate with USDA Animal and Plant Health Inspection Service (APHIS), USDA Agricultural Research Service (ARS), state departments of agriculture, and growers associations to: (1) identify key pest threats to fruit and tree nut crops, (2) iteratively develop and validate host species maps and model forecasts, (3) continue co-developing our user-friendly decision support tool, the PoPS Forecasting Platform, and (4) add an alert system that translates forecasts and simulations into actionable insights for crop protection.The iterative near-term forecasting system, coupled with data inputs enhanced using machine learning, will reduce costs for pest surveys and help growers identify when and where to intervene to protect their crops, thus reducing production losses and chemical pesticide inputs.
Animal Health Component
60%
Research Effort Categories
Basic
10%
Applied
60%
Developmental
30%
Goals / Objectives
Rapid responses and data-driven decision support tools are essential for understanding and mitigating threats posed by agricultural pests and pathogens. However, sparse data typically limit the accuracy and iterative improvement of pest spread models. The major goal of this project is to improve agricultural pest management decision support tools for stakeholders by coupling machine learning enhanced host maps with iterative near-term forecasts of pest spread. Our iterative near-term forecasting system, coupled with data inputs enhanced using machine learning, will reduce costs for pest surveys and help growers identify when and where to intervene to protect their crops, thus reducing production losses and chemical pesticide inputs. To achieve this goal our objectives are:Connect geotagged images with remote sensing technologies to create inventories of specialty crop hosts at scale. Creating geospatial inventories of host species at scale is challenged by data availability. Smaller-scale static maps can be created from time-intensive ground surveys or aerial imagery but are difficult to create and iteratively update over time at regional or country scales. Automating the process by leveraging and integrating remote sensing data portals, vetted crowdsourced observational data, and cloud computing can improve forecasting accuracy, reduce uncertainty, and importantly, decrease the time required to get predictions of pest spread. Specifically, we will use machine learning to identify host species and location from an image source using object detection and feed these new host location points to a machine learning algorithm to classify host area. To accomplish this objective, we will:Develop an object detection algorithm to identify host species from open imagery data to expand spatial extent and reduce latency in species observationsExpand satellite-based image classification algorithm to scale host mappingAssess accuracy of image classification through the collection of ground truth information from stakeholder engagementDevelop and compare iterative, near-term forecasts of pest spread affecting stakeholder-identified fruit and nut trees with and without enhanced host maps.Engaging with stakeholders to fully understand their needs and what information is required to make informed decisions facilitates the development of systems that better meet those needs and provide management relevant information. Here we will build on our current partnerships with USDA APHIS and state departments of agriculture, as well as build new partnerships with growers associations and local producers, to collaborate at all stakeholder levels. We also aim to speed up the data integration process by incorporating multi-scale data from our host mapping process, open citizen science data, and environmental drivers. We will quantify changes in forecast speed (i.e., time to delivery to decision makers) and accuracy changes due to improved data acquisition and integration of multi-scaled data. To accomplish this objective, we will:Create forecasts of pest spread affecting stakeholder identified cropsQuantify change in accuracy and precision of forecasts with updated host maps
Project Methods
Forecasting the spread of pests requires an understanding of the complex interactions between a pest, its host species, and environmental conditions.Sparse data, especially reference data as climate conditions change, are a well-known challenge facing the improvement and robustness of pest models. Spatial maps of host locations are a critical input to forecasts and currently are time-consuming to create. Methods for mapping host locations must balance trade-offs between acquisition cost and effort, spatial extent, spatial and temporal resolution, and computational requirements. We will use machine learning to identify host species and location from an image source using object detection and feed these new host location points to an ML algorithm to classify host area.For our first objective (i.e., create inventories of specialty crop hosts at scale), we will acquire geotagged imagery of host species from iNaturalist, GBIF, and Early Detection and Distribution Mapsusing their respective application programming interfaces (APIs). Data from these sources will include the image file and associated metadata including but not limited to location (i.e., coordinates), date of image capture, and species name. Evaluation of image and species identification quality and deduplication will be done leveraging common metadata tags and outlier checks. Data meeting quality thresholds will be used for two related classification tasks.The first algorithm will use the image, along with researcher-generated bounding boxes, and the species label to train an object detection algorithm to locate and classify a host species from an image. To increase the available amount of annotated data, we will also leverage recently created open data relevant to agriculture and plant pests that have been curated and annotated in concert by volunteers and experts.We place specific focus on deep Convolutional Neural Networks (CNN) given their success and accuracies above 70% in similar image classification and object detection efforts. We will also leverage transfer learning, which is the use of a model trained for a similar task on a large dataset as a starting point for developing a second model to help accelerate model training. Data will be pre-processed (e.g., resized to a uniform input size) and separated into training, validation, and test datasets. Image augmentations will be applied to training data to expand the variety of available images for training and increase the model's ability to generalize; these augmentations may include adjusting contrast or brightness, reducing image quality, and rotation. Model performance will be evaluated by examining classification (e.g. accuracy, precision, recall, etc.) and object detection (e.g. mean absolute precision) metrics, as well as visualizations to evaluate under- or overfitting. Model development will also focus on trying different network architectures and regularization techniques to minimize overfitting, and evaluating subsequent impacts to computational requirements and evaluation metrics. To increase the amoung of data availablefor a particular host species, a second development phase will explore using the geotagged iNaturalist, GBIF and EDDMapS data as automated data labels where they spatially coincide with Google Street View (GSV) and OpenStreetMap imagery data in addition to potentially manual tagging GSV data.Once developed, the classifier will be applied to unseen data from GSV to estimate host species class probabilities.For images meeting and exceeding a specified threshold (e.g. >50% likely to be the species of interest), we will convert the object bounding box to a geographic coordinate and verify classification, either with spatially coincident citizen science data or by randomly selecting a subset of images and manually validating in collaboration with stakeholders on the ground and through image interpretation of a stratified random sample of points. Positively identified outputs will then become additional point data for occurrence host maps and the raster presence/absence algorithm.The second algorithm will automate, validate, and expand the classification process for specialty crops in commercial production and in developed areasusing high-resolution satellite imagery and open data. Specifically, the host species presence data queried from geotagged imagery above will also serve as automated labeled points for pixels in satellite images (e.g. Landsat, Sentinel, Planet) to scale host presence maps to state and regional areas. Using a classifier previously developed by our team that uses repeat remote sensing observations in a Bayesian framework to provide uncertainty estimates of species presence that are converted to binary presence/absence maps at 10-m or 30-m spatial resolution and then rescaled to percent area mean and standard deviation maps at a stakeholder-selected resolution. Evaluation of the Area Under the Curve (AUC) of the Receiver Operator Characteristics (ROC) curve indicates performance of classification tasks for various thresholds.In addition to checking agreement between predictions based on unlabeled GSV data and broad categories in coarser resolution products like the NLCD (i.e., cultivated crops, deciduous forest) and crop classes in CDL when available (e.g. grapes, peaches), we also propose ground-truthing a percentage of labeled images that are in locations near USDA APHIS or state department of agriculture field surveys. Stakeholders at USDA APHIS and state departments of agriculture will validate the classification of the image either by visually assessing the image or visiting the location during a field work session. The coupling of locations with current field surveys is necessary to reduce the cost of manual validation. We will use the validation findings to inform data and model improvements as part of our iterative process.For our second objective (i.e., develop and compare forecasts of pest spread with and without machine learning enhanced host maps), wewill use the PoPS (Pest or Pathogen Spread) Forecasting System developed by our team to build forecasts for pests of fruit and nut trees.PoPS is an open-source, dynamic, process-based, spatially-explicit model that forecasts the spread of pests based on current locations and environmental and biological conditions specific to the pest being modeled.We will use the host mapping products created in objective 1 with fully specified spatially explicit uncertainties to propagate uncertainty from the host maps produced through the PoPS forecasts. Our data integration protocol will draw a unique value for each stochastic simulation of the model for every raster cell; e.g. if we are running 100,000 simulations, each simulation will draw a unique host map from the mean and standard deviation and use that map for the entire duration of the simulation for a total of 100,000 unique host maps. Providing probabilistic forecasts with fully specified uncertainties allows decision makers to target areas of high uncertainty for surveys and target high-probability areas for management. In this way, future decisions should have less uncertainty due to the linkage between surveys, management, and iterative forecasts.We will also compare improvements in model performance using previous host maps and our new host maps (objective 1). Model performance or skill will be assessed using multiple metrics as no single metric adequately captures the multiple dimensions of forecast accuracy. We will use root mean squared error (RMSE) to compare modeled population levels to observed population levels from field surveys. We will use accuracy, precision, recall/sensitivity, specificity, and odds ratiofor overall model performance across the entire landscape.