Performing Department
(N/A)
Non Technical Summary
Effective soil management is crucial for agricultural production, climate change mitigation, and adaptation, and the provision of ecosystem services. Site-specific soil management is gaining significant attention with the emergence of digital soil mapping and remote sensing technologies. Additionally, the growing interest in soil organic carbon (SOC) sequestration and the soil carbon trading industry as a climate change mitigation strategy has accentuated the need for site-specific soil monitoring and management. However, the success of such soil management protocols depends on adequate field soil observations that accurately capture the spatial variability of the soil landscape. We, therefore, propose to integrate open-source geospatial information, including high-resolution digital elevation model data (e.g., LiDAR), gridded soil survey information (e.g., Purdue SoilExplorer), and time-series satellite images (e.g., Sentinel) for the spatial optimization of soil sampling design. This integration will significantly improve the cost-efficiency and accuracy of predictive soil mapping and help precision soil management. Moreover, we will develop an automatic AI-based modeling tool for monitoring SOC changes based on input field observations and geospatial information. During the SEED phase, we will develop and evaluate the proposed methodology at Purdue research farms and several operational farms in northern Indiana and Illinois. The validated sampling optimization and predictive soil carbon models will be incorporated into a publicly accessible prototype software application. Following the SEED phase, we aim to expand the tool for regional and state-wide applications in precision soil management and climate-smart agriculture. The research produces innovative, open-source tools to support sustainable soil management and resilient agroecosystems.
Animal Health Component
40%
Research Effort Categories
Basic
20%
Applied
40%
Developmental
40%
Goals / Objectives
The overarching goal of this project is to develop a public-facing geospatial tooland interface for regional and state-level assessment and monitoring of soil health parameters to support site-specific, sustainable soil management. Specific objectives include: (i)Establishing an automated protocol to extract geospatial information at various scales from multiple public domains, (ii)Developing a modeling framework for spatially optimized soil sampling design based on the in-field assessment of the spatial variability captured using the geospatial data stack, (iii) Gathering field soil data from Purdue ACRE farm and Kankakee and Big Pine Creek watersheds through Purdue SEND Lab for predicting SOC changes across the farm fields by integrating geospatial information using artificial intelligence, (iv)Producinga prototype Application Programming Interface (API) that incorporates the outcomes of objectives (ii)and (iii)to automate spatial optimization of soil sampling designs and to predict SOC changes across farm fields when new field observations are given. The API will be hosted on the Purdue SoilExplorer program.
Project Methods
Obj 1: Establish an automated protocol to extract geospatial information at various scales from multiple public domains:We will develop a Python-based automated system for querying and acquiring geospatial data from public data warehouses. Purdue GIS library has repositories for LiDAR DEM coverage for the state of Indiana and can be accessible through the Purdue University Digital Forestry webpage. Similarly, the Purdue GIS library server hosts the gridded soil survey information produced by the Purdue SoilExplorer program which provides a national coverage of gridded soil information produced based on USDA SSURGO. The time-series Sentinel-2 images will be extracted from the European Space Agency's Copernicus Open Access Hub. The algorithm will extract the best cloud-free Sentinel-2 image scene for every month (April to October) in a given year to represent the whole growing season. Sentinel-2 has a data inventory since June 2015.All these geospatial datasets will be processed and resampled to 10 m spatial resolution (i.e., the resolution of Sentinel imagery) and will be stored as a raster stack when an area of interest is defined and queried. During the SEED phase, we will prepare this tool for geospatial data gathering to cover the Purdue ACRE farm as well as the selected farms in Kankakee and Big Pine Creek watersheds. Obj 2: Develop a modeling framework for spatially optimized soil sampling design: A soil sampling protocol will be produced based on the spatial variability captured by the stack of geospatial covariates using the Conditioned Latin Hypercube sampling (CLHs) and Bhattacharyya distance techniques (Khan et al., 2023; Minasny & McBratney, 2006).First, we will derive a suite of topographic covariates, including slope, aspect, multi-resolution index of valley bottom flatness, etc. (a full list is provided in Paul et al. (2022)). We will also extract information on dominant soil parent materials and soil texture from the Purdue SoilExplorer program and SSURGO databases. A group of soil and vegetation indices (e.g., Soil Adjusted Vegetation Index, Soil Brightness Index, Normalized Difference Tillage Index, etc.) will be generated from the time-series Sentinel-2 imagery for each year. A 5-year mean will be calculated for the Sentinel-derived indices for 4 periods every year - pre-growing season (April/May), early-growing season (May/June), peak-growing season (July/August), and post-harvest season (September/October). All the covariates will be resampled to 10 m. A repetitive variance inflation factor (VIF) analysis will be performed on all geospatial covariates to eliminate multicollinearity. A VIF cutoff value of 5 will be used to remove any covariates below the threshold. The remaining covariates will be considered for sample optimization analysis. CLHs is a stratified random sampling technique that optimizes and selects sampling locations to represent the spatial variability of input geospatial environmental covariates (Minasny & McBratney, 2006). CLHs repeatedly searches through covariate distribution allowing for spatial optimization while preserving multivariate correlation and the original distribution of the covariates. Bhattacharyya distance which quantifies the similarity between probability distributions and compares the sample distribution with the original population, will be used as the metrics for sample optimization (Khan et al., 2023). The sample design that minimizes the Bhattacharyya distance between the sample and the population of the feature space is spatially most optimized. We will perform CLHs and Bhattacharyya distance analyses using python packages written by Wagoner & Zheng (2019) and Williamson (2018). The sample optimization will be performed using a nested loop with increasing the sample sizes (e.g., 5, 10, 20, ....., 200..) in an iterative fashion (e.g. maximum iterations set at 5000) and the Bhattacharyya distance will be calculated for each sample size to understand the probability distribution.Obj 3: Gather field soil data and predict SOC changes across the farm fields: We will produce a database of field soil observations from Purdue ACRE farm as well as from selected farms within the Kankakee and Big Pine Creek watersheds. The soil data will be provided as in-kind support by the ACRE farm manager and Purdue SEND lab (see attached supporting letters). This extensive soil database traces back to 2015 providing an excellent source for model development and validation.We will utilize the VIF-analyzed geospatial covariate stack from Obj. 2 for modeling SOC changes. A convolution network model will be calibrated using Python TensorFlow based on the soil datasets and geospatial covariates from 2022-2023. Different moving windows of multiple image pixels will be tested to extract spatial and contextual information from the geospatial covariates. The neural network will make SOC predictions based on the spatial correlation of the neighboring image pixels. 80% of the field soil data will be used for model calibration and 20% for model validation. Once the model is validated for the 2022-2023 year, it will be used to predict SOC distribution in the previous year (e.g., 2015-2016) by changing the dynamic geospatial variables, i.e., Sentinel image-derived soil and vegetation indices. Comparing these predictions would produce a spatially explicit estimation of SOC changes. This SOC modeling will be performed for multiple farm fields (n=20, at least) within ACRE and Kankakee and Big Pine Creek watersheds and then will be applied to 5 test fields without calibrating the model for these specific test fields. This approach will help determine the 'global' applicability of the developed model based on geospatial information.Obj 4: Produce a prototype Application Programming Interface (API) that incorporates the outcomes of objectives 2 and 3: Finally, we will develop an API that will integrate the results from objectives 2 and 3 for automatic spatial optimization of soil sampling designs and prediction of SOC changes. During the SEED phase, the API will only cover the areas within the Purdue ACRE farm and the on-farm trial sites of the Purdue SEND lab in the Kankakee and Big Pine Creek watersheds. Beyond the SEED stage, we plan to develop the tool for regional and state-level assessments. The API will produce a graphical user interface for the stakeholders where - (i) they will provide an area of interest (i.e., field boundary), (ii) API will query the geospatial covariates for that area of interest, (iii) run a sample optimization model based on the approach described in Obj 2, and (iv) prepare spatially optimized 'smart' soil sampling designs with reporting probability distribution and spatial variability captured at different sampling intensities. We will include a cost estimate for lab analysis of basic soil health properties at different sampling intensities. This will enable the users to select the sampling design based on their needs and available budget. By integrating results from Obj. 3, the API will also be able to predict SOC changes for a given field when new soil data is provided. The API will be hosted on the website and smartphone application of the Purdue SoilExplorer program through the Purdue GIS library server.Therefore, the project team has an excellent advantage of utilizing this widely-known and trusted soil landscape visualization tool to host this new API.