Source: PURDUE UNIVERSITY submitted to NRP
LEVERAGING OPEN-SOURCE GEOSPATIAL INFORMATION FOR SUSTAINABLE SOIL MANAGEMENT AND CLIMATE-SMART AGRICULTURE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1032323
Grant No.
2024-67021-42522
Cumulative Award Amt.
$299,452.00
Proposal No.
2023-11642
Multistate No.
(N/A)
Project Start Date
Sep 15, 2024
Project End Date
Sep 14, 2026
Grant Year
2024
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
PURDUE UNIVERSITY
(N/A)
WEST LAFAYETTE,IN 47907
Performing Department
(N/A)
Non Technical Summary
Effective soil management is crucial for agricultural production, climate change mitigation, and adaptation, and the provision of ecosystem services. Site-specific soil management is gaining significant attention with the emergence of digital soil mapping and remote sensing technologies. Additionally, the growing interest in soil organic carbon (SOC) sequestration and the soil carbon trading industry as a climate change mitigation strategy has accentuated the need for site-specific soil monitoring and management. However, the success of such soil management protocols depends on adequate field soil observations that accurately capture the spatial variability of the soil landscape. We, therefore, propose to integrate open-source geospatial information, including high-resolution digital elevation model data (e.g., LiDAR), gridded soil survey information (e.g., Purdue SoilExplorer), and time-series satellite images (e.g., Sentinel) for the spatial optimization of soil sampling design. This integration will significantly improve the cost-efficiency and accuracy of predictive soil mapping and help precision soil management. Moreover, we will develop an automatic AI-based modeling tool for monitoring SOC changes based on input field observations and geospatial information. During the SEED phase, we will develop and evaluate the proposed methodology at Purdue research farms and several operational farms in northern Indiana and Illinois. The validated sampling optimization and predictive soil carbon models will be incorporated into a publicly accessible prototype software application. Following the SEED phase, we aim to expand the tool for regional and state-wide applications in precision soil management and climate-smart agriculture. The research produces innovative, open-source tools to support sustainable soil management and resilient agroecosystems.
Animal Health Component
40%
Research Effort Categories
Basic
20%
Applied
40%
Developmental
40%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
10101992061100%
Knowledge Area
101 - Appraisal of Soil Resources;

Subject Of Investigation
0199 - Soil and land, general;

Field Of Science
2061 - Pedology;
Goals / Objectives
The overarching goal of this project is to develop a public-facing geospatial tooland interface for regional and state-level assessment and monitoring of soil health parameters to support site-specific, sustainable soil management. Specific objectives include: (i)Establishing an automated protocol to extract geospatial information at various scales from multiple public domains, (ii)Developing a modeling framework for spatially optimized soil sampling design based on the in-field assessment of the spatial variability captured using the geospatial data stack, (iii) Gathering field soil data from Purdue ACRE farm and Kankakee and Big Pine Creek watersheds through Purdue SEND Lab for predicting SOC changes across the farm fields by integrating geospatial information using artificial intelligence, (iv)Producinga prototype Application Programming Interface (API) that incorporates the outcomes of objectives (ii)and (iii)to automate spatial optimization of soil sampling designs and to predict SOC changes across farm fields when new field observations are given. The API will be hosted on the Purdue SoilExplorer program.
Project Methods
Obj 1: Establish an automated protocol to extract geospatial information at various scales from multiple public domains:We will develop a Python-based automated system for querying and acquiring geospatial data from public data warehouses. Purdue GIS library has repositories for LiDAR DEM coverage for the state of Indiana and can be accessible through the Purdue University Digital Forestry webpage. Similarly, the Purdue GIS library server hosts the gridded soil survey information produced by the Purdue SoilExplorer program which provides a national coverage of gridded soil information produced based on USDA SSURGO. The time-series Sentinel-2 images will be extracted from the European Space Agency's Copernicus Open Access Hub. The algorithm will extract the best cloud-free Sentinel-2 image scene for every month (April to October) in a given year to represent the whole growing season. Sentinel-2 has a data inventory since June 2015.All these geospatial datasets will be processed and resampled to 10 m spatial resolution (i.e., the resolution of Sentinel imagery) and will be stored as a raster stack when an area of interest is defined and queried. During the SEED phase, we will prepare this tool for geospatial data gathering to cover the Purdue ACRE farm as well as the selected farms in Kankakee and Big Pine Creek watersheds. Obj 2: Develop a modeling framework for spatially optimized soil sampling design: A soil sampling protocol will be produced based on the spatial variability captured by the stack of geospatial covariates using the Conditioned Latin Hypercube sampling (CLHs) and Bhattacharyya distance techniques (Khan et al., 2023; Minasny & McBratney, 2006).First, we will derive a suite of topographic covariates, including slope, aspect, multi-resolution index of valley bottom flatness, etc. (a full list is provided in Paul et al. (2022)). We will also extract information on dominant soil parent materials and soil texture from the Purdue SoilExplorer program and SSURGO databases. A group of soil and vegetation indices (e.g., Soil Adjusted Vegetation Index, Soil Brightness Index, Normalized Difference Tillage Index, etc.) will be generated from the time-series Sentinel-2 imagery for each year. A 5-year mean will be calculated for the Sentinel-derived indices for 4 periods every year - pre-growing season (April/May), early-growing season (May/June), peak-growing season (July/August), and post-harvest season (September/October). All the covariates will be resampled to 10 m. A repetitive variance inflation factor (VIF) analysis will be performed on all geospatial covariates to eliminate multicollinearity. A VIF cutoff value of 5 will be used to remove any covariates below the threshold. The remaining covariates will be considered for sample optimization analysis. CLHs is a stratified random sampling technique that optimizes and selects sampling locations to represent the spatial variability of input geospatial environmental covariates (Minasny & McBratney, 2006). CLHs repeatedly searches through covariate distribution allowing for spatial optimization while preserving multivariate correlation and the original distribution of the covariates. Bhattacharyya distance which quantifies the similarity between probability distributions and compares the sample distribution with the original population, will be used as the metrics for sample optimization (Khan et al., 2023). The sample design that minimizes the Bhattacharyya distance between the sample and the population of the feature space is spatially most optimized. We will perform CLHs and Bhattacharyya distance analyses using python packages written by Wagoner & Zheng (2019) and Williamson (2018). The sample optimization will be performed using a nested loop with increasing the sample sizes (e.g., 5, 10, 20, ....., 200..) in an iterative fashion (e.g. maximum iterations set at 5000) and the Bhattacharyya distance will be calculated for each sample size to understand the probability distribution.Obj 3: Gather field soil data and predict SOC changes across the farm fields: We will produce a database of field soil observations from Purdue ACRE farm as well as from selected farms within the Kankakee and Big Pine Creek watersheds. The soil data will be provided as in-kind support by the ACRE farm manager and Purdue SEND lab (see attached supporting letters). This extensive soil database traces back to 2015 providing an excellent source for model development and validation.We will utilize the VIF-analyzed geospatial covariate stack from Obj. 2 for modeling SOC changes. A convolution network model will be calibrated using Python TensorFlow based on the soil datasets and geospatial covariates from 2022-2023. Different moving windows of multiple image pixels will be tested to extract spatial and contextual information from the geospatial covariates. The neural network will make SOC predictions based on the spatial correlation of the neighboring image pixels. 80% of the field soil data will be used for model calibration and 20% for model validation. Once the model is validated for the 2022-2023 year, it will be used to predict SOC distribution in the previous year (e.g., 2015-2016) by changing the dynamic geospatial variables, i.e., Sentinel image-derived soil and vegetation indices. Comparing these predictions would produce a spatially explicit estimation of SOC changes. This SOC modeling will be performed for multiple farm fields (n=20, at least) within ACRE and Kankakee and Big Pine Creek watersheds and then will be applied to 5 test fields without calibrating the model for these specific test fields. This approach will help determine the 'global' applicability of the developed model based on geospatial information.Obj 4: Produce a prototype Application Programming Interface (API) that incorporates the outcomes of objectives 2 and 3: Finally, we will develop an API that will integrate the results from objectives 2 and 3 for automatic spatial optimization of soil sampling designs and prediction of SOC changes. During the SEED phase, the API will only cover the areas within the Purdue ACRE farm and the on-farm trial sites of the Purdue SEND lab in the Kankakee and Big Pine Creek watersheds. Beyond the SEED stage, we plan to develop the tool for regional and state-level assessments. The API will produce a graphical user interface for the stakeholders where - (i) they will provide an area of interest (i.e., field boundary), (ii) API will query the geospatial covariates for that area of interest, (iii) run a sample optimization model based on the approach described in Obj 2, and (iv) prepare spatially optimized 'smart' soil sampling designs with reporting probability distribution and spatial variability captured at different sampling intensities. We will include a cost estimate for lab analysis of basic soil health properties at different sampling intensities. This will enable the users to select the sampling design based on their needs and available budget. By integrating results from Obj. 3, the API will also be able to predict SOC changes for a given field when new soil data is provided. The API will be hosted on the website and smartphone application of the Purdue SoilExplorer program through the Purdue GIS library server.Therefore, the project team has an excellent advantage of utilizing this widely-known and trusted soil landscape visualization tool to host this new API.

Progress 09/15/24 to 09/14/25

Outputs
Target Audience:Some preliminary results and project plans were presented at several stakeholder and scientific events within Purdue University. We will also present the preliminary results at the CANVAS meeting in Salt Lake City, UT, in November 2025. The abstract has been accepted for the conference. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project has trained an MSc student. The student will also stay involved in the next year of the project. How have the results been disseminated to communities of interest?Preliminary model results have been presented at several stakeholder and scientific events within Purdue. We will also present the initial findings at the CANVAS conference in Salt Lake City, UT in November 2025. The abstract has been accepted. What do you plan to do during the next reporting period to accomplish the goals?We have confirmed the recruitment of a postdoc who will join the project in November 2025. Based on initial modeling conducted by the MSc student, the postdoc will lead the development of the prototype tool. The MSc student and the postdoc will collaborate to establish an automated protocol for extracting publicly available geospatial information. We have already identified the specific data sources and data types that will be included in the prototype tool. We plan to follow the timeline below: • Oct-Dec, 2025: Establish an automated protocol to extract geospatial information from Google Earth Engine, USDA SSURGO database, Soil Explorer server. • Jan-Mar, 2026: Develop a server to host and perform the geospatial analysis • Apr-Jul, 2026: Implement the spatial optimization approaches (identified from this year's modeling) on the server to develop the prototype soil sampling tool • Aug-Sep, 2026: Evaluate and finalize the tool. Draft a methodology paper for publication in a peer-reviewed journal.

Impacts
What was accomplished under these goals? 1. Field soil data has been gathered from (i) Purdue ACRE and SERAC research centers, and (ii) Kankakee and Big Pine Creek watersheds. We collected soil data from multiple farms for training models of soil sampling optimization and prediction of soil maps. We directly worked with several farmers, farm managers, and Purdue University extension agents for data collection. We also collaborated with the NRCS office in Indianapolis to gather state-level geospatial datasets and soil survey datasets. 2. An extensive literature review has been performed to identify the statistical tools for the spatial optimization of soil sampling design. Our literature review indicated that the spatial variability of soil properties is poorly represented in soil management research, thereby limiting their application for developing strategies for site-specific, precision management. We identified that spatial sample optimization can reduce the cost of modeling soil properties across farm fields by 50-60%. 3. We also trained and tested the initial models using the collected field data. The results indicated that spatially optimized, reduced sample design performs equally for predicting soil properties, resulting in significant cost savings for the farmers and land managers interested in performing precision and digital soil management. The model results will now be applied to construct the prototype API tool as proposed.

Publications