Performing Department
(N/A)
Non Technical Summary
Key to soil health is maintenance of the soil's organic carbon content. Carbon credit markets incentivize practices that increase the soil organic carbon content so that the resulting carbon sequestration will contribute to stabilization of atmospheric carbon dioxide levels. However, such markets are not feasible without accurate and cost-effective means for verifying the soil organic carbon stock.The ultimate goal of this effort is a prototype unit, the Subterra Green, that can rapidly and accurately map soil organic carbon in three dimensions to depth of 90 cm. The unit employs a visible/near-infrared spectroscopic probe that is pushed into the soil at intervals and is small and maneuverable enough to be operated by one person. Building on the Phase I results, the specific objectives of the Phase II project are (1) to extend the generalizability of the Subterra method to a large, agriculturally important region of the U.S.; (2) to define the site characteristics for which a given model is applicable; (3) to improve and document the minimum change in per-site soil organic carbon stock detectable using the Subterra method; and (4) to continue to improve the accuracy and precision of per-sample soil organic carbon models.Our approach is to extend generalizability while improving accuracy and precision by sampling a broad range of sites in the target area in order to increase the quality and volume of data input into machine learning models. Work is organized around four field campaigns in which a total of eight sites will be intensively sampled and mapped. Lower-intensity sampling will be also be conducted at an additional 36 sites. To ensure optimal distribution of the input data, site selection and data collection will be evenly spread over a rubric of relevant environmental co-variates in the target area. Global, regional, and local models of soil organic carbon content will be developed, and model performance will be continuously tracked on the environmental co-variates rubric.We expect to consistently achieve a precision of better than 0.3 Mg C per hectare for soil organic carbon stock determination, sufficient for verification in the carbon credit market. The record of model performance with respect to the co-variates rubric will demonstrate that these high accuracy levels are achievable in commercial application throughout the target area without the need for large numbers of additional calibration samples, so that the Subterra Green can be included among accepted protocols for measurement, recording, and verification in the carbon credit market. Beyond the scope of this Phase II project, but still important for commercial application in large agricultural settings, we plan to automate the collection of Subterra Green data by mounting it on a robotic unit.
Animal Health Component
30%
Research Effort Categories
Basic
(N/A)
Applied
30%
Developmental
70%
Goals / Objectives
Our technology, the Subterra Green, is a mobile field unit with a visible and near infrared (VNIR) spectroscopic probe and a load cell for measuring probe insertion force. As the unit moves across a target area, the probe is pushed into the soil at intervals, measuring light reflected from a column of soil immediately adjacent to the probe. These data, and the location and force required to insert the probe, are converted into a 3D map of subsurface volumetric soil organic carbon stock, as well as point measurements of soil organic carbon concentration and soil bulk density. These maps can be used to promote soil health and for soil carbon credit accounting.Our Phase I project demonstrated that the Subterra Green can be used to estimate and map soil organic carbon concentration and stock with satisfactory accuracy at several study sites and across these sites. The Phase I project serves as a solid start to our broader goal of achieving commercial acceptance of the Subterra Green in the carbon credit market. Nevertheless, before it can be applied commercially, it will be necessary to demonstrate that the method consistently achieves accuracy and precision sufficient for measuring expected levels of change in soil organic carbon under a wide variety of soil types and environmental conditions. Therefore, our major goal is to extend the scope of the training data to encompass a far wider range of site types, to develop models that are applicable to broad geographical regions, and to define and track the parameters (e.g., soil types, topography, land use practices) that influence the generalizability of model performance.The technical objectives of this Phase II project are:(1) to extend the applicability of the Subterra method to a large, diverse, and agriculturally important land area within the U.S. while retaining high levels of prediction accuracy on novel data within this area;(2) to define the site characteristics for which a given model is applicable;(3) to document the minimum change in per-site soil organic carbon stock that is detectable using the Subterra Green in such a way as to be cost-effective for verification of carbon stock credits; and(4) to continue to improve the accuracy and precision of per-sample models for soil organic carbon concentration, soil organic carbon stock, and bulk density.
Project Methods
EffortsSite selection. In order to ensure optimal distribution of the input data, data collection sites will be spread evenly over the range of relevant environmental co-variates in the target area (the "attribute space") as depicted in a rubric developed from mapped soil properties and terrain variables. The rubric will visualize the attribute space of the target area and guide and track site distribution. For each target area, we will develop a rubric using data from the USDA gSSURGO soil database and terrain properties derived from the USGS 1 arc-second digital elevation model. These two datasets will be masked to exclude non-cropland using a cropland data layer. Dimensionality of the dataset will be reduced using principal component analysis, and clustered using k-means clustering. Sampling sites and individual sampling locations will be tracked relative to the clusters. The initial target area is the Central Feed Grains and Livestock Land Resource Region (LRR M) as defined by the USDA, which encompasses some of the most highly productive cropland of the U.S. and has a high proportion of its area in cropland. For the first field season, we will work in MLRA 111 inside the broader target area. If at the end of the field season these goals are met for the target area, subsequent field campaigns will expand to a new MLRA or LRR. If not, sampling will continue within the previous target area.Data collection. Data collection activities will be organized around two different site types, intensive and extensive. Intensive sites will be used to produce a single-site model, test accuracy and precision of soil organic carbon stock estimation, and create 3D maps of soil organic carbon stock, while the extensive sites are designed to efficiently extend calibration into previously uncovered portions of attribute space, even though in isolation they are not adequate for site-specific modeling or 3D mapping. At intensive sites, in a field of about 5 ha, probe insertions will be in a grid pattern with a spacing of 15 m. A soil core will be obtained at about 25 % of the probe insertions. At each extensive site of between 20 and 100 ha in size, we will obtain 15 probe insertions and the corresponding 15 cores. Specific probe insertion locations will be chosen by stratifying the site to attribute space characteristics, and randomly sampling five points within each of three strata. Soil core samples will be analyzed for organic carbon content at a commercial laboratory. Samples will be divided into training (60%), validation (20%), and test (20%) sets using stratified random selection. To maintain independence of the validation and test sets, all samples from a given soil core will be assigned to the same set.Data collection activities will be organized into four field campaigns, conducted in the fall and spring of 2023-2024 and of 2024-2025. A campaign will begin with rubric construction and site selection. Potential sites will be identified through contacts from partner organizations and coordination with the soil and water conservation districts of the target counties. Next a two person team will collect data at two intensive and nine extensive sites. Models will be updated continuously, with metrics visualized in an internal dashboard.Modeling. The aim of the Phase II modeling activities will be to develop models that are generalizable to novel data in a broad geographical area while maintaining high levels of accuracy. We will evaluate model performance at site, regional, and global levels. For any given site, the baseline is the model trained only on data collected at that site. Regional and global models, trained on data from broader geographical regions, will be evaluated with the usual accuracy metrics (RMSE, R2, and RPIQ), but also by comparing the RMSE of validation of the broader model to that of the local site model.EvaluationSite selection. Our goal is to collect data which will ensure both broad geographic coverage and definition of the site characteristics. At the end of each of the four field campaigns, we will evaluate the coverage within the current target area. The target is that data will be collected in >80% of the rubric clusters, fulfilling requirements for broad geographic coverage and efinition of site characteristics) For generalizability, the target is that mean deltaRMSE will be better than -10%. DeltaRMSE is calculated as ((RMSEa - RMSEb)/ RMSEb * 100) where RMSEa is the RMSE of the broader model applied to the individual site validation set and RMSEb is the RMSE of the site-specific model. If at the end of a field campaign these goals are met for the target area, the subsequent field campaign will expand to a new MLRA or LRR. If not, sampling will continue within the previous target area.Modelling. We will evaluate model performance at site, regional, and global levels. For any given site, the baseline is the model trained only on data collected at that site. Regional and global models, trained on data from broader geographical regions, will be evaluated with the usual accuracy metrics (RMSE, R2, and RPIQ), but also by comparing the RMSE of validation of the broader model to that of the local site model. The target metric is that models will consistently exceed RPIQ > 2.0 and R2 > 0.8.As we incorporate new training data and new modeling techniques, we will continuously track DRMSE at the site, regional, and global levels, with the target being DRMSE (broad vs. site) better than -10%. To track model performance on novel sites, the metric deltaRMSE will be calculated for models developed both including and excluding data collected at the site from the training set, targeting deltaRMSE (excluded vs included) better than -10%. To establish the minimum change in per-site soil organic carbon stock that is detectable using the Subterra Green, for each of the intensive sites we will conduct a formal analysis of measurement uncertainty with a target of precision sufficient to detect an increase of 0.3 Mg organic carbon per hectare.