Performing Department
(N/A)
Non Technical Summary
Growers rely on extension agents to inform them of appropriate methodologies and timing for pest management practices. In turn, extension agents often rely on decision-support tools to provide predictions of pest phenology. Although phenological modeling has recently seen major advances, largely due to sophisticated machine learning models, those models remain largely inaccessible to extension agents since they require extensive data science expertise to use and are not yet integrated into existing decision support tools. Here, we propose the development of a decision-support platform that automates the use of machine learning models to make predictions about pest phenology. The central objectives of this project are to perform research and development for machine learning models yielding optimal predictive performance of pest phenology, develop a containerized software platform, validate the results with collection of new data, and communicate with end users for feedback and training. As a test case, we will utilize 18 years of data on three aphid pests from the Midwest Suction Trap Network, creating a web-ready decision-support tool for these pests. The tool will be built using modern best practices in software development, facilitating adaptation to other pests for which similar datasets exist. This proposed project aligns with the DSFAS Program Area Priority with a focus on the Plant Health and Production and Plant Products NIFA Priority Area by integrating artificial intelligence with modern software engineering technologies to maximize utilization of existing data and helping farmers make informed and economical decisions about managing resources to control aphid pests.
Animal Health Component
0%
Research Effort Categories
Basic
25%
Applied
0%
Developmental
75%
Goals / Objectives
The goal of this project is to developa decision-support platform that automates the use of machine learning models to make predictions about pest phenology.Our core objectives entail the: 1) exploration of machine learning models yielding optimal predictive performance, including exploration of a wide array of weather and climatic input features, 2) development of an automated, containerized, and web-ready decision-support tool that produces visually intuitive predictions for the timing of critical phenological events, and 3) collection of new data for model validation and communication to potential end users.As a test case, we will utilize 18 years of data on three aphid pests from the Midwest Suction Trap Network, creating a web-ready decision-support tool for these pests.
Project Methods
Objective 11a. Explore model architectures yielding best performanceWe will engage in an in-depth exploration of modeling approaches to determine the architecture that will provide the best possible performance in predicting the key elements of aphid phenology most useful for pest management.We will compare direct and indirect methods of modeling phenological indicators. Indirect methods reconstruct complete abundance curves from which phenological indicators are extracted. Methods to do so include general linear models (GLM) and generalized additive models (GAMs), as well as more flexible non-linear models in purview of machine learning, such as random forests, gradient boosted machines (GBMs), and artificial neural networks (ANNs). In contrast, direct methods model the phenological indicators per se explicitly without reconstructing phenology curves.The basic approach we propose is to train models for each of these architectures on data for each of the three aphid species and to use cross-validation to determine unbiased estimates of model performance. For each model architecture, we intend to use state-of-the-art hyperparameter tuning methodologies with the multivariate optimizer Optuna, and explore the feature set yielding most predictive models.1b. Explore input feature sets yielding best performanceExploring feature sets is orthogonal to exploring model architectures in that we will conduct a complete exploration of feature sets for each model architecture described above in 1a. We propose to cast a wide net, including a diversity of weather and geographic features as well as historical and early-season pest phenology. The majority of effort with respect to input features will be spent harvesting and transforming raw data for use in modeling. Once features are model-ready, we will use established methods in evaluating feature importance (e.g., SHAP, LIME) to determine the most valuable features for achieving performant models.Weather features will include, but are not limited to, temperature, precipitation, and humidity, and will be used in the form of lags and window aggregations on these variables. Cumulative degree days - easily the most commonly used predictor in pest phenology models - is itself an expanding window aggregation on temperature, summing temperature above a given threshold over the days preceding a given time point. Similarly, we will include sliding and expanding window aggregations on other weather variables, for example calculating the total amount of precipitation over the whole season or a set number of days preceding sampling. Geographical features, which will vary only across space, may include variables like elevation, aspect, and percent agricultural land in the surrounding area. Pest data may include variables such as peak abundance in prior year and date of first detection.Objective 22a. Automate weather data download, dataset transformation, and phenology predictionWe will build a codebase of programmatic routines to automate 1) the download of recent weather and geographic data, 2) the transformation of all relevant data into a scoring dataset for prediction (i.e., creating all features used in model training), and 3) prediction of pest abundance through time. Based on a user-selected location, weather data will be harvested from the ASOS weather station system and geographic data (e.g. elevation) will be harvested from publicly available APIs like the USGS National Map services. Our scripts will combine and transform the weather and geographic data into a scoring dataset as dictated by the features used in model training. Finally, the scripts will use pre-trained models from Objective 1 to make phenological predictions for presentation to the user.2b. Design and develop user interfaceThe design of a dashboard-style user interface will be a collaborative process between scientists at partnering universities, developers at EcoData Technology, and end-users of the tool. We will conduct interviews with individual and small groups (2-5) of extension agents to elicit details and nuances of their workflows to determine the most critical features for inclusion in the interface. With a detailed dashboard design in hand, software developers at EcoData technology will develop a frontend application with the desired features that will present phenological predictions along with raw weather and geographic data to the end users.2c. Containerize application for delivery and centralized web hosting with CI/CDWe will combine the backend and frontend components described above into a single Docker container. We will then host the container in the cloud on Amazon Web Services and make it publicly available for use by our engineers during the process of development as well as to extension agents for the purposes of user-acceptance testing and soliciting feedback for improvements to the tool. Additionally, we will set up CI/CD (continuous integration / continuous deployment) to automate the testing and deployment of incremental changes to the application.Objective 33a. Collection of aphid data over the course of the projectNew aphid data will be collected by groups working with co-PI Seiter and co-PI Groves, directly coordinating with staff and volunteers that contribute samples to the Midwestern Suction Trap Network. Data will be contributed to the repository managed by the network.3b. Model validation and retrainingThe new data collected by the suction trap network described above will be used to validate the predictions of models built in early phases of this project (Objective 1). Furthermore, during each year of the grant, we will refit models with the new data to ensure models are up to date and provide the best possible predictions as the tool moves into production.We will perform a model validation and retraining effort with each year of newly collected aphid data (3a). The approach, following best practices in data science, is to make predictions about new time periods (the "validation set" or "test set") with old models that were trained using data only from prior periods (the "training set"). For the purposes of communication to end users, intuitive metrics of performance either in the scale of the response (e.g., MAE with time) or using percentages (MAPE) will be utilized, and prepared into graphics, potentially comparing current performance with that of status quo methodologies.3c. User outreach, training, and product evaluationTo ensure we create an application that is accurate, intuitive, and useful for extension professionals throughout all phases of this project, we will engage extensively with extension users. We primarily plan to communicate with extension users through our 16-member advisory board. In addition to initial design consultations, after a prototype of the web application is complete, we will share and solicit feedback from our advisory board. Via annual meetings and emails, we will distribute the aphid forecasting platform to advisory board members who will then share the application to their local network of extension specialists and county agents. We will also disseminate the application through the University of Delaware, University of Wisconsin, and University of Illinois extension programs via the PI and co-PIs extension programs and colleagues.In the final year, we will provide a 'deeper-dive' training workshop for extension personnel, including a survey rating utility and usability of the web tool. To facilitate sustainability of this tool, we will invite workshop participation from potential future collaborators (e.g. University of Minnesota 'Aphid Alert' network, https://aphidalert.umn.edu/). Based on guidance from our advisory panel and extension collaborators, we will produce training materials (e.g. user guide, FAQs, tutorials) accessible from the web platform.