Source: S4 MOBILE LABORATORIES LLC submitted to
SBIR PHASE II: USE OF IN-SITU SHALLOW SUBSURFACE SPECTROSCOPY FOR MEASURING SOIL ORGANIC CARBON
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1031067
Grant No.
2023-70449-40587
Cumulative Award Amt.
$650,000.00
Proposal No.
2023-04030
Multistate No.
(N/A)
Project Start Date
Sep 1, 2023
Project End Date
Aug 31, 2025
Grant Year
2023
Program Code
[8.4]- Air, Water and Soils
Project Director
Barrett, L.
Recipient Organization
S4 MOBILE LABORATORIES LLC
526 S MAIN ST STE 813C
AKRON,OH 443114401
Performing Department
(N/A)
Non Technical Summary
Key to soil health is maintenance of the soil's organic carbon content. Carbon credit markets incentivize practices that increase the soil organic carbon content so that the resulting carbon sequestration will contribute to stabilization of atmospheric carbon dioxide levels. However, such markets are not feasible without accurate and cost-effective means for verifying the soil organic carbon stock.The ultimate goal of this effort is a prototype unit, the Subterra Green, that can rapidly and accurately map soil organic carbon in three dimensions to depth of 90 cm. The unit employs a visible/near-infrared spectroscopic probe that is pushed into the soil at intervals and is small and maneuverable enough to be operated by one person. Building on the Phase I results, the specific objectives of the Phase II project are (1) to extend the generalizability of the Subterra method to a large, agriculturally important region of the U.S.; (2) to define the site characteristics for which a given model is applicable; (3) to improve and document the minimum change in per-site soil organic carbon stock detectable using the Subterra method; and (4) to continue to improve the accuracy and precision of per-sample soil organic carbon models.Our approach is to extend generalizability while improving accuracy and precision by sampling a broad range of sites in the target area in order to increase the quality and volume of data input into machine learning models. Work is organized around four field campaigns in which a total of eight sites will be intensively sampled and mapped. Lower-intensity sampling will be also be conducted at an additional 36 sites. To ensure optimal distribution of the input data, site selection and data collection will be evenly spread over a rubric of relevant environmental co-variates in the target area. Global, regional, and local models of soil organic carbon content will be developed, and model performance will be continuously tracked on the environmental co-variates rubric.We expect to consistently achieve a precision of better than 0.3 Mg C per hectare for soil organic carbon stock determination, sufficient for verification in the carbon credit market. The record of model performance with respect to the co-variates rubric will demonstrate that these high accuracy levels are achievable in commercial application throughout the target area without the need for large numbers of additional calibration samples, so that the Subterra Green can be included among accepted protocols for measurement, recording, and verification in the carbon credit market. Beyond the scope of this Phase II project, but still important for commercial application in large agricultural settings, we plan to automate the collection of Subterra Green data by mounting it on a robotic unit.
Animal Health Component
30%
Research Effort Categories
Basic
(N/A)
Applied
30%
Developmental
70%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
10101102000100%
Knowledge Area
101 - Appraisal of Soil Resources;

Subject Of Investigation
0110 - Soil;

Field Of Science
2000 - Chemistry;
Goals / Objectives
Our technology, the Subterra Green, is a mobile field unit with a visible and near infrared (VNIR) spectroscopic probe and a load cell for measuring probe insertion force. As the unit moves across a target area, the probe is pushed into the soil at intervals, measuring light reflected from a column of soil immediately adjacent to the probe. These data, and the location and force required to insert the probe, are converted into a 3D map of subsurface volumetric soil organic carbon stock, as well as point measurements of soil organic carbon concentration and soil bulk density. These maps can be used to promote soil health and for soil carbon credit accounting.Our Phase I project demonstrated that the Subterra Green can be used to estimate and map soil organic carbon concentration and stock with satisfactory accuracy at several study sites and across these sites. The Phase I project serves as a solid start to our broader goal of achieving commercial acceptance of the Subterra Green in the carbon credit market. Nevertheless, before it can be applied commercially, it will be necessary to demonstrate that the method consistently achieves accuracy and precision sufficient for measuring expected levels of change in soil organic carbon under a wide variety of soil types and environmental conditions. Therefore, our major goal is to extend the scope of the training data to encompass a far wider range of site types, to develop models that are applicable to broad geographical regions, and to define and track the parameters (e.g., soil types, topography, land use practices) that influence the generalizability of model performance.The technical objectives of this Phase II project are:(1) to extend the applicability of the Subterra method to a large, diverse, and agriculturally important land area within the U.S. while retaining high levels of prediction accuracy on novel data within this area;(2) to define the site characteristics for which a given model is applicable;(3) to document the minimum change in per-site soil organic carbon stock that is detectable using the Subterra Green in such a way as to be cost-effective for verification of carbon stock credits; and(4) to continue to improve the accuracy and precision of per-sample models for soil organic carbon concentration, soil organic carbon stock, and bulk density.
Project Methods
EffortsSite selection. In order to ensure optimal distribution of the input data, data collection sites will be spread evenly over the range of relevant environmental co-variates in the target area (the "attribute space") as depicted in a rubric developed from mapped soil properties and terrain variables. The rubric will visualize the attribute space of the target area and guide and track site distribution. For each target area, we will develop a rubric using data from the USDA gSSURGO soil database and terrain properties derived from the USGS 1 arc-second digital elevation model. These two datasets will be masked to exclude non-cropland using a cropland data layer. Dimensionality of the dataset will be reduced using principal component analysis, and clustered using k-means clustering. Sampling sites and individual sampling locations will be tracked relative to the clusters. The initial target area is the Central Feed Grains and Livestock Land Resource Region (LRR M) as defined by the USDA, which encompasses some of the most highly productive cropland of the U.S. and has a high proportion of its area in cropland. For the first field season, we will work in MLRA 111 inside the broader target area. If at the end of the field season these goals are met for the target area, subsequent field campaigns will expand to a new MLRA or LRR. If not, sampling will continue within the previous target area.Data collection. Data collection activities will be organized around two different site types, intensive and extensive. Intensive sites will be used to produce a single-site model, test accuracy and precision of soil organic carbon stock estimation, and create 3D maps of soil organic carbon stock, while the extensive sites are designed to efficiently extend calibration into previously uncovered portions of attribute space, even though in isolation they are not adequate for site-specific modeling or 3D mapping. At intensive sites, in a field of about 5 ha, probe insertions will be in a grid pattern with a spacing of 15 m. A soil core will be obtained at about 25 % of the probe insertions. At each extensive site of between 20 and 100 ha in size, we will obtain 15 probe insertions and the corresponding 15 cores. Specific probe insertion locations will be chosen by stratifying the site to attribute space characteristics, and randomly sampling five points within each of three strata. Soil core samples will be analyzed for organic carbon content at a commercial laboratory. Samples will be divided into training (60%), validation (20%), and test (20%) sets using stratified random selection. To maintain independence of the validation and test sets, all samples from a given soil core will be assigned to the same set.Data collection activities will be organized into four field campaigns, conducted in the fall and spring of 2023-2024 and of 2024-2025. A campaign will begin with rubric construction and site selection. Potential sites will be identified through contacts from partner organizations and coordination with the soil and water conservation districts of the target counties. Next a two person team will collect data at two intensive and nine extensive sites. Models will be updated continuously, with metrics visualized in an internal dashboard.Modeling. The aim of the Phase II modeling activities will be to develop models that are generalizable to novel data in a broad geographical area while maintaining high levels of accuracy. We will evaluate model performance at site, regional, and global levels. For any given site, the baseline is the model trained only on data collected at that site. Regional and global models, trained on data from broader geographical regions, will be evaluated with the usual accuracy metrics (RMSE, R2, and RPIQ), but also by comparing the RMSE of validation of the broader model to that of the local site model.EvaluationSite selection. Our goal is to collect data which will ensure both broad geographic coverage and definition of the site characteristics. At the end of each of the four field campaigns, we will evaluate the coverage within the current target area. The target is that data will be collected in >80% of the rubric clusters, fulfilling requirements for broad geographic coverage and efinition of site characteristics) For generalizability, the target is that mean deltaRMSE will be better than -10%. DeltaRMSE is calculated as ((RMSEa - RMSEb)/ RMSEb * 100) where RMSEa is the RMSE of the broader model applied to the individual site validation set and RMSEb is the RMSE of the site-specific model. If at the end of a field campaign these goals are met for the target area, the subsequent field campaign will expand to a new MLRA or LRR. If not, sampling will continue within the previous target area.Modelling. We will evaluate model performance at site, regional, and global levels. For any given site, the baseline is the model trained only on data collected at that site. Regional and global models, trained on data from broader geographical regions, will be evaluated with the usual accuracy metrics (RMSE, R2, and RPIQ), but also by comparing the RMSE of validation of the broader model to that of the local site model. The target metric is that models will consistently exceed RPIQ > 2.0 and R2 > 0.8.As we incorporate new training data and new modeling techniques, we will continuously track DRMSE at the site, regional, and global levels, with the target being DRMSE (broad vs. site) better than -10%. To track model performance on novel sites, the metric deltaRMSE will be calculated for models developed both including and excluding data collected at the site from the training set, targeting deltaRMSE (excluded vs included) better than -10%. To establish the minimum change in per-site soil organic carbon stock that is detectable using the Subterra Green, for each of the intensive sites we will conduct a formal analysis of measurement uncertainty with a target of precision sufficient to detect an increase of 0.3 Mg organic carbon per hectare.

Progress 09/01/23 to 08/31/24

Outputs
Target Audience:The Subterra Green addresses our customers' need to accurately, quickly, and cheaply measure soil organic carbon. We have found two potential markets for the Subterra Green: agricultural customers and environmental customers. Our initial target agricultural customers are large-scale farmers, cooperative representatives (sharing a device among small farmers), and soil measurement service providers. We have been in contact with dozens of farmers in order to secure access to their land for measurement and mapping. Our initial target environmental customers are soil researchers (private and governmental),companies providing carbon credit measurement services, and environmentalists and government regulators tasked with verifying carbon sequestration. We have several commercial environmental research companies that are intereested in evaluating our technology. Changes/Problems:Our goal was to have sampled 18 extensive sites and four intensive sites by this point in the grant performance period, but have in fact sampled 17 extensive sites and no intensive sites. Since progress in modeling depends on the data obtained by the sampling, progress towards these goals, and documentation using the target metrics, has also been delayed. In particular, the fact that we have not yet conducted sampling at an intensive site has prevented us from documenting model accuracy and precision for SOC stock (Objective #3). The delay in sampling is primarily due to problems with the availability of our prototype Subterra unit that occurred early in the performance period. We had contracted with a manufacturing partner to make two new prototype Subterra units in spring of 2023, prior to the beginning of the grant performance period in September 2023. We were promised that the units would be delivered to us by the beginning of July 2023. On the strength of that promise, our existing prototype was sold (as a Subterra Grey forensic model) in June 2023, leaving us without a working unit. However, the manufacturing partner was unable to satisfactorily complete our prototypes, so we retrieved them in an unfinished state in October 2023 and brought them for finishing to our original manufacturing partner. In addition, the manufacturer of the motor used to drive the probe ceased production of the motor. The newer Subterra prototypes therefore use a motor from a different manufacturer, which we needed to incorporate into our control application. We therefore did not have a functioning Subterra unit until after the beginning of 2024, which put us significantly behind the work plan described in the grant proposal. What opportunities for training and professional development has the project provided?The USDA SBIR Phase II project employs student and graduate students from the University of Akron to provide both field and lab work. All work done by students is done under close supervision and guidance of an S4 scientist. Field work teaches basic skills in laying out grids, using the Subterra Green to measure SOC, calibration techniques such taking core samples for data validation. Lab works includes the skills needed for soil analysis such as bulk density measurement. How have the results been disseminated to communities of interest?Dissemination of information is currently done by the S4 Mobile Laboratories web site (www.s4laboratories.com) and attendant social media (i.e., LinkedIn blogs). The majority of our discussions has been directly with researchers through academic channels, trade shows, etc. Since we are not actively selling the Subterra Green, these activities will increase in August 2025. What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period, we will complete our field sampling campaigns. By the end of the next reporting period we will have sampled a total of 36 extensive sites (of which 17 have already been completed) and eight intensive sites. Using the data collected from these samples, we will improve our models for predicting SOC and BD using the Subterra Green spectral data. For the intensive sites, we will map SOC stock in three dimensions, and test our model's accuracy and precision for predicting SOC stock.

Impacts
What was accomplished under these goals? (1)To meet the requirements of extending generalizability of our SOC models while maintaining high accuracy and precision, our sampling choices are aimed at increasing both the quantity and the quality of the data available for training the machine learning models. Machine learning algorithms require large volumes of training data, but it is important that the training dataset encompass examples distributed throughout the range of target property variability. Soil spectroscopy requires a database that widely samples the soil variability within the study area. The relationships between spectra and soil properties can be both spatially dependent and highly non-linear, and it is difficult to construct a training set that adequately reflects the immense variation found in soils. We are therefore tracking the environmental factors that may influence spectral variation in the sites that we select, and are attempting to ensure that the training dataset fully encompasses the variability in our target population, defined as the soils of the cropland area of MLRA 111. In the two field campaigns of the project period, we have completed sampling at 17 extensive sites, distributed primarily in MLRA 111A . Two sites are located in MLRA 111B, and five in MLRA 139. We chose to sample in MLRA 139 in order to provide a coherent body of data from outside MLRA 111 that will act as a bridge between our newly-collected Phase II data and data previously collected during our Phase I activities, which were all from outside MLRA 111. We discuss our choice of site locations more fully under Objective #2, below. To date, we have not sampled at any intensive sites. Because the sampling activities at intensive sites are designed to test accuracy and precision of SOC stock estimation, we chose to delay these until we had developed good global models using data from extensive sites, which we have only recently had sufficient calibration data to develop (see discussion in section (4)). (2) Wetested a variety of soil and environmental datasets and different methods of combining them into a rubric that would be as comprehensive as possible and yet easy to visualize and understand, and would also be readily extensible into new target regions over time. Ultimately we settled on a primary rubric that allows for straightforward tracking of overall site type, while also retaining the option of using secondary attributes that can be visualized as necessary. (3)This objective depends on data collected from the intensive site types, which we have not yet sampled, as discussed above in sections (1) and (2). Challenges contributing to the slippage in progress toward the goal are discussed below. (4)The aim of the Phase II modeling activities is to develop models that are generalizable to novel data in a broad geographical area while maintaining high levels of accuracy. Towards that end we have engaged in a number of activities: We have collected new data for model training, as described above. We have monitored procedures and data throughout the data collection and modeling processes in order to maintain quality data and eliminate spurious data points. This has included obtaining new sampling equipment for collecting the soil cores, standardizing lab procedures for measuring BD and subsampling for OC measurement, and carefully examining spectra in order to eliminate noisy outlier spectra before they are added to our training dataset. We have expanded the set of land surface parameters (LSPs) calculated from ancillary data sources (primarily elevation grids) that can be included with the training data and have calculated these LSPs for a wider range of spatial scales than we did for our Phase I activities. In Phase I results, LSPs generally provided a slight performance boost, which we believe may be increased with a better match between spatial scale and soil processes. We have explored new machine learning model types. In particular, we are beginning to use artificial neural network (ANN) models with encouraging results, and when our dataset becomes large enough we will also try both 1D and 2D convolutional neural networks (CNN) model types. We have identified, obtained, and begun modeling with a large public soil spectral library that may also help to boost model performance. The chosen library is the Natural Resource Conservation Service's Rapid Carbon Assessment (RaCA) library (Wills et al. 2014; Staff and Loecke 2016; Wijewardane et al. 2016), which is appropriate for our purposes not only because it includes data from throughout the conterminous United States, but also because, unlike many other such libraries, it includes both surface and subsurface samples with information about the depth from which the samples were obtained. The main drawback of this library is that its samples were scanned in the laboratory in a dry, ground state, so its utility when combined with our moist, in-situ spectra is as yet unknown.

Publications