DSFAS PARTNERSHIP: Leveraging automation and AI to predict critical periods for managing insect pests

DSFAS PARTNERSHIP: LEVERAGING AUTOMATION AND AI TO PREDICT CRITICAL PERIODS FOR MANAGING INSECT PESTS

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

ACTIVE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1032253

Grant No.

2024-67022-42525

Cumulative Award Amt.

$722,356.00

Proposal No.

2023-11647

Multistate No.

(N/A)

Project Start Date

Sep 1, 2024

Project End Date

Aug 31, 2028

Grant Year

2024

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
UNIVERSITY OF DELAWARE
(N/A)
NEWARK,DE 19717

Performing Department
(N/A)

Non Technical Summary
Growers rely on extension agents to inform them of appropriate methodologies and timing for pest management practices. In turn, extension agents often rely on decision-support tools to provide predictions of pest phenology. Although phenological modeling has recently seen major advances, largely due to sophisticated machine learning models, those models remain largely inaccessible to extension agents since they require extensive data science expertise to use and are not yet integrated into existing decision support tools. Here, we propose the development of a decision-support platform that automates the use of machine learning models to make predictions about pest phenology. The central objectives of this project are to perform research and development for machine learning models yielding optimal predictive performance of pest phenology, develop a containerized software platform, validate the results with collection of new data, and communicate with end users for feedback and training. As a test case, we will utilize 18 years of data on three aphid pests from the Midwest Suction Trap Network, creating a web-ready decision-support tool for these pests. The tool will be built using modern best practices in software development, facilitating adaptation to other pests for which similar datasets exist. This proposed project aligns with the DSFAS Program Area Priority with a focus on the Plant Health and Production and Plant Products NIFA Priority Area by integrating artificial intelligence with modern software engineering technologies to maximize utilization of existing data and helping farmers make informed and economical decisions about managing resources to control aphid pests.

Animal Health Component

Research Effort Categories

Basic

25%

Applied

Developmental

75%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
216	1510	1070	33%
216	1310	1070	33%
216	1599	1070	34%

Knowledge Area
216 - Integrated Pest Management Systems;

Subject Of Investigation
1310 - Potato; 1510 - Corn; 1599 - Grain crops, general/other;

Field Of Science
1070 - Ecology;

Keywords

climate change

sustainable food

Goals / Objectives
The goal of this project is to developa decision-support platform that automates the use of machine learning models to make predictions about pest phenology.Our core objectives entail the: 1) exploration of machine learning models yielding optimal predictive performance, including exploration of a wide array of weather and climatic input features, 2) development of an automated, containerized, and web-ready decision-support tool that produces visually intuitive predictions for the timing of critical phenological events, and 3) collection of new data for model validation and communication to potential end users.As a test case, we will utilize 18 years of data on three aphid pests from the Midwest Suction Trap Network, creating a web-ready decision-support tool for these pests.

Project Methods
Objective 11a. Explore model architectures yielding best performanceWe will engage in an in-depth exploration of modeling approaches to determine the architecture that will provide the best possible performance in predicting the key elements of aphid phenology most useful for pest management.We will compare direct and indirect methods of modeling phenological indicators. Indirect methods reconstruct complete abundance curves from which phenological indicators are extracted. Methods to do so include general linear models (GLM) and generalized additive models (GAMs), as well as more flexible non-linear models in purview of machine learning, such as random forests, gradient boosted machines (GBMs), and artificial neural networks (ANNs). In contrast, direct methods model the phenological indicators per se explicitly without reconstructing phenology curves.The basic approach we propose is to train models for each of these architectures on data for each of the three aphid species and to use cross-validation to determine unbiased estimates of model performance. For each model architecture, we intend to use state-of-the-art hyperparameter tuning methodologies with the multivariate optimizer Optuna, and explore the feature set yielding most predictive models.1b. Explore input feature sets yielding best performanceExploring feature sets is orthogonal to exploring model architectures in that we will conduct a complete exploration of feature sets for each model architecture described above in 1a. We propose to cast a wide net, including a diversity of weather and geographic features as well as historical and early-season pest phenology. The majority of effort with respect to input features will be spent harvesting and transforming raw data for use in modeling. Once features are model-ready, we will use established methods in evaluating feature importance (e.g., SHAP, LIME) to determine the most valuable features for achieving performant models.Weather features will include, but are not limited to, temperature, precipitation, and humidity, and will be used in the form of lags and window aggregations on these variables. Cumulative degree days - easily the most commonly used predictor in pest phenology models - is itself an expanding window aggregation on temperature, summing temperature above a given threshold over the days preceding a given time point. Similarly, we will include sliding and expanding window aggregations on other weather variables, for example calculating the total amount of precipitation over the whole season or a set number of days preceding sampling. Geographical features, which will vary only across space, may include variables like elevation, aspect, and percent agricultural land in the surrounding area. Pest data may include variables such as peak abundance in prior year and date of first detection.Objective 22a. Automate weather data download, dataset transformation, and phenology predictionWe will build a codebase of programmatic routines to automate 1) the download of recent weather and geographic data, 2) the transformation of all relevant data into a scoring dataset for prediction (i.e., creating all features used in model training), and 3) prediction of pest abundance through time. Based on a user-selected location, weather data will be harvested from the ASOS weather station system and geographic data (e.g. elevation) will be harvested from publicly available APIs like the USGS National Map services. Our scripts will combine and transform the weather and geographic data into a scoring dataset as dictated by the features used in model training. Finally, the scripts will use pre-trained models from Objective 1 to make phenological predictions for presentation to the user.2b. Design and develop user interfaceThe design of a dashboard-style user interface will be a collaborative process between scientists at partnering universities, developers at EcoData Technology, and end-users of the tool. We will conduct interviews with individual and small groups (2-5) of extension agents to elicit details and nuances of their workflows to determine the most critical features for inclusion in the interface. With a detailed dashboard design in hand, software developers at EcoData technology will develop a frontend application with the desired features that will present phenological predictions along with raw weather and geographic data to the end users.2c. Containerize application for delivery and centralized web hosting with CI/CDWe will combine the backend and frontend components described above into a single Docker container. We will then host the container in the cloud on Amazon Web Services and make it publicly available for use by our engineers during the process of development as well as to extension agents for the purposes of user-acceptance testing and soliciting feedback for improvements to the tool. Additionally, we will set up CI/CD (continuous integration / continuous deployment) to automate the testing and deployment of incremental changes to the application.Objective 33a. Collection of aphid data over the course of the projectNew aphid data will be collected by groups working with co-PI Seiter and co-PI Groves, directly coordinating with staff and volunteers that contribute samples to the Midwestern Suction Trap Network. Data will be contributed to the repository managed by the network.3b. Model validation and retrainingThe new data collected by the suction trap network described above will be used to validate the predictions of models built in early phases of this project (Objective 1). Furthermore, during each year of the grant, we will refit models with the new data to ensure models are up to date and provide the best possible predictions as the tool moves into production.We will perform a model validation and retraining effort with each year of newly collected aphid data (3a). The approach, following best practices in data science, is to make predictions about new time periods (the "validation set" or "test set") with old models that were trained using data only from prior periods (the "training set"). For the purposes of communication to end users, intuitive metrics of performance either in the scale of the response (e.g., MAE with time) or using percentages (MAPE) will be utilized, and prepared into graphics, potentially comparing current performance with that of status quo methodologies.3c. User outreach, training, and product evaluationTo ensure we create an application that is accurate, intuitive, and useful for extension professionals throughout all phases of this project, we will engage extensively with extension users. We primarily plan to communicate with extension users through our 16-member advisory board. In addition to initial design consultations, after a prototype of the web application is complete, we will share and solicit feedback from our advisory board. Via annual meetings and emails, we will distribute the aphid forecasting platform to advisory board members who will then share the application to their local network of extension specialists and county agents. We will also disseminate the application through the University of Delaware, University of Wisconsin, and University of Illinois extension programs via the PI and co-PIs extension programs and colleagues.In the final year, we will provide a 'deeper-dive' training workshop for extension personnel, including a survey rating utility and usability of the web tool. To facilitate sustainability of this tool, we will invite workshop participation from potential future collaborators (e.g. University of Minnesota 'Aphid Alert' network, https://aphidalert.umn.edu/). Based on guidance from our advisory panel and extension collaborators, we will produce training materials (e.g. user guide, FAQs, tutorials) accessible from the web platform.

Progress 09/01/24 to 08/31/25

Outputs
Target Audience:Our target audience for Year 1 comprised growers and extension specialists, as well as entomologists/modelers. We conducted 11 interviews with growers/administrators representing several state commodity groups to understand the needs and opportunities for pest forecasting tools. Data from the suction trap network was gathered by >30 growers and extension specialists, and was downloaded by 166 users. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?EcoData staff involved in this effort already had experience in modeling pest insect outbreaks & web tools prior to the project, but did not yet have experience with aphid pests of row crops in the US Midwest. By working on this project, these members of the larger project team gained experience in a new group of insect pests. A new PhD student was recruited to the University of Delaware to begin work to identify the most informative weather- and landscape-based covariates for predicting the timing and magnitude of aphid incursions into crop fields. This student was able to begin developing her quantitative and critical thinking skills, and will continue to develop professional and analytical skills over the course of the project. A research scientist supported by this project was able to continue her leadership and coordination of the Midwest Suction Trap Network, and was able to train four undergraduate students from the University of Illinois in taxonomy/systematics of agriculturally important insects. How have the results been disseminated to communities of interest?Preliminary modelingresults were shared with the entire research group in an all hands meeting in March 2025. EcoData staff have also initiated 11 interviews with potential stakeholders that belong to commodity boards in the US Midwest, such as the IllinoisSoybean Board, and solicited preliminary feedback on the utility of predictive models for soybean aphids and related soybean pests. We have also shared the suction trap data with our 28 suction trap network collaborators in a weekly basis during sampling season. Also, in collaboration with Joseph LaForest (Department of Entomology, Center for Invasive Species and Ecosystem Health, University of Georgia; Southern IPM Center), a voluminous collection of records of insect crops pests from the STN are available at https://suctiontrapnetwork.org/ and https://www.eddmaps.org. These data have been downloaded by 166 users. What do you plan to do during the next reporting period to accomplish the goals?For the last two objectives of the project, EcoData staff will update the modeling infrastructure to be supported by a web-based decision support tool. Concurrent with this process in 2025-2026, EcoData staff will schedule and organize interviews with staff of commodity boards in the US Midwest and Mid-Atlantic for crops that are affected by soybean aphid, bird-cherry oat aphid, and green peach aphid. The PhD student recruited for the project will examine relationships between the timing and magnitude of aphid captures and weather- and landscape-related covariates which will ultimately be used to improve forecasts by the tool that we develop. We will continue sampling aphids across the Midwest Suction Trap Network at weekly intervals through to October 2025, starting again in May 2026.

Impacts
What was accomplished under these goals? As part of the modeling efforts in Objective 1, EcoData staff have completed the development of code infrastructurein R to analyze insect data from the suction trap network and make forecasts using random forest/generalized boosted models. Including climate-based thresholds to predict phenology, weatherconditions can be used to make predictions of insect abundance for the most abundant species found in the suction trap network dataset. For Objective 3, we have begun collecting aphid data from 28 suction traps in 10 states in the US Midwest. Specimens and data collected to date have been used for multiple research studies, not only on aphids, but other insects such as soybean thrips (vector of soybean vein necrosis virus, SVNV) and corn leafhopper (vector of corn stunt disease). The operation of the STN was made possible thanks to the devoted collaboration of a variety of farmers, extension, and research personnel. In 2024, we collected 594 suction trap samples between 17 May and 18 October. In 2025 we have collected 305 samples between 16 May and 25 July, and sampling is ongoing. We have shared the suction trap data with our 28 suction trap network collaborators in a weekly basis during sampling season. Also, in collaboration with Joseph LaForest (Department of Entomology, Center for Invasive Species and Ecosystem Health, University of Georgia; Southern IPM Center), a voluminous collection of records of insect crops pests from the STN are available at https://suctiontrapnetwork.org/ and https://www.eddmaps.org. This data allows for studies on distribution of known species and new or non-identified species captured by the suction traps. The most diverse genera identified in suction trap samples were Anoecia, Aphis, Pemphigus, Rhopalosiphum, Uroleucon. The most abundant aphid genera were the cereal aphids (Melanaphis sacchari, Rhopalosiphum padi, R. maidis, R. rufiabdominale, Sitobionavenae, and Schizaphisgraminum). Specimens and data collected to date have been used for multiple research studies, not only on aphids, but other insects such as thrips, which is vector of soybean vein necrosis virus (SVNV) and leafhoppers. Stored samples from the traps are also available to the research community for further studies. As more data becomes available, the Southern IPM Center will continue to provide a long-term system to record data, generate visualizations for basic data exploration, and make the data available for other research and modelling efforts.

Publications

Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Lagos-Kutz, D.M., Clark, R.B., Seiter, N., Clough, S.J., Hartman, G.L., Crossley, M. 2024. Tracking flight activity of potato leafhoppers (Hemiptera: Cicadellidae) with the Midwest Suction Trap Network. Environmental Entomology, 53:433-441. https://doi.org/10.1093/ee/nvae023