Source: WOODS HOLE RESEARCH CENTER submitted to
FACT CIN: SOIL SPECTROSCOPY FOR THE GLOBAL GOOD
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1023775
Grant No.
2020-67021-32467
Cumulative Award Amt.
$999,702.00
Proposal No.
2019-07459
Multistate No.
(N/A)
Project Start Date
Sep 1, 2020
Project End Date
Aug 31, 2024
Grant Year
2020
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
WOODS HOLE RESEARCH CENTER
149 WOODS HOLE ROAD
FALMOUTH,MA 02540-1644
Performing Department
(N/A)
Non Technical Summary
Today's data-driven agriculture demands access to high-resolution spatial and temporal data streams. Soil scientists in the United States and globally have been struggling to meet this demand. This FACT Coordinated Innovation Network will provide a global, open-access and open source, easy-to-use platform for reliably predicting soil properties from infrared spectra that will offer low-cost, meaningful, and accurate data streams to monitor soil health as it is impacted by efforts to conserve it and its use by agriculture. Diffuse reflectance spectroscopy is becoming an indispensable tool in soil science; however, several technical challenges still limit its broader application outside of research projects. This project will network soil spectroscopists with experts in informatics, data science and software engineering to overcome some of the current bottlenecks preventing wider and more efficient use of soil spectroscopy. The network will define a common vocabulary, metadata requirements, spectral quality standards, and best practice lab procedures. The network will create software to quality check, harmonize and standardize spectra and soil data collections. The network will deliver a web-based software platform, The Global Soil Spectroscopy Library, backed by multiple spectral databases and robust statistical models, that derives soil properties from the spectral data. Demonstration, outreach and educational activities will promote the use of the Global Soil Spectral Library and data-driven science as solutions to the soil data crisis.
Animal Health Component
25%
Research Effort Categories
Basic
50%
Applied
25%
Developmental
25%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
1010110206150%
1010110208025%
1010110209025%
Goals / Objectives
The primary goal of this Coordinated Innovation Network (CIN) is to share knowledge and build a community of practice with a common goal of increasing the availability of soil data for numerous applications. This goal will be accomplished by transforming diffuse reflectance soil spectroscopy into a routine analytical tool and further demonstrating it's value in soil biogeochemical modeling. Networking soil spectroscopy experts with experts in informatics, data science and software engineering will be a critical step in overcoming some of the current bottlenecks preventing wider and more efficient use of soil spectroscopy. In order to transform soil spectroscopy into a routine analytical tool, this network will focus its activities on a range of basic and applied objectives:Objective 1. Development of a global spectral librarySoil spectroscopy has primarily been a project specific tool but recently developed regional and national spectral libraries can make it a much more broadly applicable tool. This objective seeks to combine several existing regional and national spectral libraries into a single database that can then serve as a resource for all soil spectroscopy users. Multiple challenges need to be addressed in order to combine spectral libraries. These challenges include dealing with different spectral measurement protocols, instruments, sample preparation and analytical methodologies. Many of these challenges also apply to the metadata (soil properties) in the calibration data sets as well.Objective 2. Development of a global estimation serviceCurrently, estimating soil properties from diffuse reflectance spectra requires specialized statistical knowledge and either expensive commercial software packages or programming skills in open-source languages such as R or Python. This project will develop a robust and intuitive web-based software platform where users can upload spectra and receive estimations of a number of soil properties with only a few mouse clicks. Given that there is unlikely to be a one-size-fits-all solution, the estimation service will host multiple model forms and provide ensemble predictions with associated uncertainty for each soil property.Objective 3. Transferability of estimation serviceIn order for the global estimation service to be useful to all spectroscopy users, the issue of differential instrument response between spectrometers needs to be resolved. This is a key research challenge that this CIN must address. Several efforts are already underway for visible and near infrared spectroscopy but no such efforts have been made for mid infrared spectroscopy. In conjunction with a sample round-robin exchange among participating laboratories, several mathematical and sample-based methods for calibration transfer will be tested.Objective 4. Accessibility and hosting of database and estimation serviceBoth the database and estimation service will be open community resources. There will be two main categories of data users: researchers interested in using the spectral libraries in their own research and development applications, and practitioners interested in getting predictions of soil properties on their own data. For the research category, the system will be used to compile and distribute both raw data, analysis-ready data and results of queries (data exports). Copies of the raw data, when allowed by the original data owner, will be hosted on zenodo.org each using a unique DOI. Analysis-ready data and data exports will be shared using some of the Open GeoSpatial Consortium (OGC) standards, with metadata fully documented via the project homepage. The system REST API will contain links to specific metadata (e.g. per site, per organization and/or per project id). For the practitioner category of user, the system will allow upload, via an web interface, of new spectra in a variety of formats and provide predictions with associated uncertainty metrics as simple tabular data.Objective 5. Education, outreach and useThis objective has two different target audiences. First, we aim to provide current and potential soil spectroscopy users with best practice guidance for contributing to and utilizing our global soil spectroscopy service. Second, we will demonstrate the utility of this approach to provide robust and low cost estimates of numerous soil properties through targeted research applications which will include soil health and soil carbon monitoring and modeling.
Project Methods
This project will have three major focus areas - data enrichment, data access and data insights. A combination of technical teams and working groups, composed of self-selected network members, will be employed to advance the use of DRS as a routine analytical tool.The first task of the network will be drafting of a domain model design for a Global Soil Spectral Library (GSSL). A CIN working group will develop a controlled language, procedures and suggestions for database structure and allow continual refinement of a domain model. This work will begin by 1) compiling sampling, common preparation, and measurement protocols used by participants in the network, 2) identifying where protocols differ significantly enough to be considered distinct, and describe these in a vocabulary, and 3) develop scripts to connect data contributions to the GSSL database.The second task consists of building the main architecture of the GSSL itself, consisting of three main components: 1) GSSL DB: a global compilation of shared soil spectroscopy data and associated soil properties entered into a fast and scalable database; 2) GSSL Engine: Core computing library optimized for GSSL DB that will enable user directed predictive modeling (also available as an R package). The library will be integrated into a REST Django service running of High Performance Computing infrastructure optimized for serving multi-array (raw soil spectroscopy) and geographical data; and 3) GSSL data viewer to allow for easy geo-referenced selection (web-mapping interface), subset and download of data, visualization of data, model performance and output.In order to tackle the issue of transferability of the estimation service across instruments, a CIN working group will be formed with the task of recommending a set of solutions that can then be implemented by the GSSL development technical team. This group will explore a set of interrelated questions: When is calibration transfer necessary? Can different combinations of spectral pretreatment and model choice account for differences in spectral response between instruments? Is there a reference standard that can be used to calibrate MIR spectra to a common standard as developed for vis-NIR soil spectra? If sample-dependent calibration transfer is necessary, what are the best statistical approaches? A common set of soil samples will be analyzed by a number of participating laboratories in order to provide much of the necessary data to address these questions.There is no agreed upon best-in-case algorithms for spectroscopy-based prediction of soil properties and different model forms may be preferable depending on the specifications of the spectral data and soil property of interest. A CIN working group will convene to explore this topic in detail beginning with an up-to-date review of currently employed modeling approaches. With already available datasets, this group will have access to well over 100,000 spectra with associated laboratory data on dozens of soil properties to develop a set of recommendations. To explore other out-of-the-box solutions, this group will consider conducting a Kaggle or AIcrowd competition to spur innovation from the data science world. A critical component in the translation of spectroscopy to a routine production tool is the development of statistics to indicate when predictions should not be trusted (i.e. the new sample falls outside of the calibration space) and reporting of predictions with robust estimates of the uncertainty about that prediction. This working group will explore and report on optimal solutions for both of these problems.In order to raise awareness of the GSSL and soil spectroscopy in general, a CIN working group will be charged with leading demonstration and outreach activities. First, a basic spectral lab consisting of a drying oven, soil grinder, a portable MIR spectrometer and a laptop computer will be available to this group for demonstrations with different users. For example, there is growing interest in DRS amongst commercial soil testing laboratories. This group will organize various demonstrations through existing networks, meetings and workshops. In year 3 of the project, the group will have a display booth to demonstrate the GSSL at the Soil Science Society of America Meeting. The GSSL will also be piloted through the ICRAF/AfSIS Africa-Asia lab network and through other use cases with the private sector in Africa.Soil properties are a critical gap in soil biogeochemical model development. Soils are extremely heterogeneous, making it challenging to take a full suite of soil carbon measurements at the necessary spatial and temporal resolutions. Our biogeochemical modeling work will explore the implications of high spatial-temporal resolution data with robust uncertainty bounds on the estimation of landscape level biogeochemical cycling via CENTURY-style first order linear decomposition models. In addition, we will explore tying soil spectroscopy data to non-traditional soil properties like microbial biomass, extracellular enzyme concentration, and mineral-bound fraction data to inform newer process-rich soil models like MEND and MIMICS with explicit non-linear soil process representations. Questions in both frameworks that we will explore are: how precise and how frequent do the inferred soil properties need to be to improve model performance in comparison to laboratory measurements?

Progress 09/01/20 to 08/31/24

Outputs
Target Audience:Our target audiences for this project were: 1) researchers with current interests in soil spectroscopy; 2) soil testing and agronomy services, particularly in the countries without good soil testing laboratories; 3) agricultural and global change modeling communities which have identified lack of spatially comprehensive information on soil properties as key constraints. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Following the release of the second version (v1.2) of the Open Soil Spectral Library, we compiled training resources on how to use the OSSL effectively. These included a blog post (https://soilspectroscopy.org/ossl-version-1-2-released/) and a training webinar (https://av.tib.eu/media/63143). Together these two resources have been viewed over 4000 times. At the 2023 tri-societies meeting in St. Louis, MO, we hosted a half-day hands-on training workshop using R statistical software for predictive soil spectroscopy (openly published training material can be found here: https://soilspectroscopy.github.io/soilspec-workshop/), convened a cross-divisional sponsored symposium on soil spectroscopy, and sponsored contributed oral and poster sessions in the Soil Chemistry Division on soil spectroscopy. Following the tri-societies meeting, we hosted an invitation only workshop titled "The future of soil spectroscopy" where 30 participants discussed and drafted a position paper on the most important outstanding topics in the field. The project also supported the training and professional development of a graduate student, Dellena Bloom, at the University of Florida. How have the results been disseminated to communities of interest?In addition to multiple peer-review publications (11 and counting) and presentations at major scientific conferences (9 and counting), this project continues to engage a broad audience through our website (soilspectroscopy.org) with educational materials, recordings of webinars, blog posts, newsletters and an archive of project outputs and other important literature on the topic of soil spectroscopy. The website has had >10000 visitors from 148 countries since its launch three years ago. Our X account (https://x.com/soilspec) has sent out >300 posts on soil spectroscopy and currently has 831 followers. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Objectives 1, 2, 4 - These objectives have been fully accomplished. The Open Soil Spectral Library (OSSL) now includes more than 135,000 observations spanning (incompletely) the visible-and-near-infrared (Vis, 350-2500 nm), near-infrared (1350-2550 nm), and the MIR (600-4000 cm-1) ranges, with different sample sizes. There are nearly 50 distinct soil properties now included in the OSSL. The OSSL is publicly available (https://soilspectroscopy.github.io/ossl-manual/databaseaccess.html) on Google Cloud Storage, application programming interface (API), MongoDB, and via a web service called OSSL Explorer (https://explorer.soilspectroscopy.org/). We have also improved the documentation of the OSSL to ensure that all original datasets are clearly recognized with recommended citations to ensure that the original data producers are properly acknowledged. The OSSL estimation service consists of 144 pre-trained models that are accessible via Google Cloud Storage, API, and the web service called OSSL Engine (https://engine.soilspectroscopy.org/). The predictive models were fitted with the MLR3 framework within the R statistical programming language, specifically using the Cubist machine learning algorithm. The final modeling configuration was implemented after running several internal experiments, benchmarking models and preprocessing methods. Another innovative is that we employed the conformal prediction method for deriving uncertainty intervals. Additionally, the outputs from the OSSL models not only give the response and uncertainty prediction but also a flag if the spectra to be predicted is represented by the feature space from the calibration set. All these improvements were made to increase the trustworthiness of the OSSL models to users. The development and performance of these models are now documented in a peer-reviewed publication just accepted by PLoS One. Lastly, it is important to mention that all the reproducible code is available on GitHub (https://github.com/soilspectroscopy). We also updated the OSSL Manual (https://soilspectroscopy.github.io/ossl-manual/) with the new dataset contributions, database description, access options, new predictive modeling workflow, and web applications. An overview of the OSSL was recently made via webinar with the recording publicly available online (https://av.tib.eu/series/1078/soil+spectroscopy+for+the+global+good). Objective 3 - The ring trial experiment was completed and has now been published in Geoderma (10.1016/j.geoderma.2023.116724). To recap, we sent out a standard set of 70 samples to over 35 labs around the world to be scanned in both the mid and visible/near infrared regions. These spectra were then used to assess instrument-to-instrument variability and to figure out the best pre-processing and modeling methods to achieve high quality predictions on as many instruments as possible. The big finding of this experiment is that, in the mid-infrared region, all instruments produce internally high-quality predictions but not all instruments are compatible with existing soil spectral libraries, as hosted in the OSSL. For highly contrasting instruments, calibration transfer functions, a mathematical transformation of the spectra using a common set of scanned samples, were needed to produce acceptable results. The vis-NIR spectra are being analyzed by network members in Belgium with results being prepared for peer-review in late 2024. Objective 5 - detailed more below.

Publications

  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Mitu, S. M., Smith, C., Sanderman, J., Ferguson, R. R., Shepherd, K., & Ge, Y. (2024). Evaluating consistency across multiple NeoSpectra (compact Fourier transform near?infrared) spectrometers for estimating common soil properties. Soil Science Society of America Journal, 88, 13241339. 10.1002/saj2.20678
  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Hong, Y., Sanderman, J., Hengl, T., Chen, S., Wang, N., Xue, J., Zhui, Z., Peng, J., Chen, Y., & Shi, Z. (2024). Potential of globally distributed topsoil mid-infrared spectral library for organic carbon estimation. Catena, 235, 107628. 10.1016/j.catena.2023.107628
  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2023 Citation: Safanelli, J. L., Sanderman, J., Bloom, D., Todd-Brown, K., Parente, L. L., Hengl, T., and 48 others. (2023). An interlaboratory comparison of mid-infrared spectra acquisition: Instruments and procedures matter. Geoderma, 440, 116724. 10.1016/j.geoderma.2023.116724
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Safanelli, J.L., Sanderman, J., Bloom, D.E., Todd-Brown, K.E., Parente, L.L., Hengl, T. (2023). Advances of the Soil Spectroscopy for Global Good (SS4GG) Initiative. AGU Fall Meeting. https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1389273
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Sanderman, J., Safanelli, J.L., Bloom, D.E., Parente, L.L., Hengl, T., Todd-Brown, K.E. (2023). Introducing the Open Soil Spectral Library. ASA, CSSA, SSSA International Annual Meeting. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/149052
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Safanelli, J.L., Sanderman, J., Bloom, D.E., Todd-Brown, K.E., Parente, L.L., Hengl, T.. (2023). Interoperability of Shared Middle-Infrared Soil Spectral Libraries and Instruments. ASA, CSSA, SSSA International Annual Meeting. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/148974
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Smith, C., Sanderman, J., Mitu, S.M., Ge, Y., Shepherd, K.D., Ferguson, R. (2023). Application of a Handheld Near Infrared Spectrophotometer to Farm-Scale Soil Monitoring. ASA, CSSA, SSSA International Annual Meeting. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/149852
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Mitu, S.M., Ge, Y., Murad, M.O.F., Smith, C., Sanderman, J., Ferguson, R., Shepherd., K.D. (2023). Consistency Assessment of a Low-Cost, Fourier Transform Near-Infrared (FT-NIR) Spectrometer for Estimating Common Soil Properties. ASA, CSSA, SSSA International Annual Meeting. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/150689
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Bloom D., Safanelli, J.L., Sanderman, J., Parente, L.L., Hengl, T., Todd-Brown, K.E. (2023). Haar Wavelet Transformations for Mir Soil Spectroscopy: Speed, Precision, Field-Wetness Signal-to-Noise Ratio, and Cross-Instrument Calibration. ASA, CSSA, SSSA International Annual Meeting. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/150385


Progress 09/01/22 to 08/31/23

Outputs
Target Audience:Our target audiences for this project remain: 1) researchers with current interests in soil spectroscopy; 2) soil testing and agronomy services, particularly in the countries without good soil testing laboratories; 3) agricultural and global change modeling communities which have identified lack of spatially comprehensive information on soil properties as key constraints. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Following the release of the second version (v1.2) of the Open Soil Spectral Library, we compiled training resources on how to use the OSSL effectively. These included a blog post (https://soilspectroscopy.org/ossl-version-1-2-released/) and a training webinar (https://av.tib.eu/media/63143). Together these two resources have been viewed over 1000 times. The project continues tosupportthe training and professional development of one postdoctoral researcher, Dr. Jose Lucas Sanfanelli based at Woodwell Climate, and a graduate student, Dellena Bloom at the University of Florida. How have the results been disseminated to communities of interest?This project continues to engage a broad audience through our website (soilspectroscopy.org) with educational materials, blog posts, newsletters and an archive of project and other important literature on the topic of soil spectroscopy. The website has had >10000 visitors from 148 countries since its launch 2.5 years ago. Our twitter account (https://twitter.com/soilspec) has sent out >200 tweets on soil spectroscopy and currently has 700 followers. What do you plan to do during the next reporting period to accomplish the goals?Two final papers will be submitted, and hopefully published in this final reporting period - 1) a paper in Geoderma documenting the results of the ring trial experiment described above; and 2) a paper in PLoS One fully describing the Open Soil Spectral Library and its performance. These two papers represent the culmination of the main activities in this project. The full network workshop scheduled for theSpring of 2023 was pushed back to October 2023 to coincide with the Tri-society meeting in St. Louis, MO. In addition to this "visioning" workshop, we have organized through the SSSA several events at the Tri-society meeting -- 1) a half-day hands-on training workshop using R statistical software for predictive soil spectroscopy, 2) a cross-divisonal sponsored symposium on soil spectroscopy, and 3) contributed oral and poster sessions in the Soil Chemistry Division on soil spectroscopy. We hope to draft and submit a community opinion paper on where the field of soil spectroscopy is headed as a key outcome of the visioning workshop.

Impacts
What was accomplished under these goals? Objectives 1, 2, 4 -The Open Soil Spectral Library (OSSL) was updated from version 1.0 to 1.2, making the database larger and more representative to different global soil types. The new version includes four new datasets: i) the Scion Research mid-infrared (MIR) library, which contains a few hundred MIR spectra of New Zealand forest soils from different land uses and soil orders; ii) the University of Zurich permafrost MIR library, which contains more than 250 MIR spectra of permafrost soils from Canada; iii) the Serbian soil spectral library from the University of Novi Sad, which contains 135 unique MIR spectra of Serbian soils from different regions and soil types; and iv) the Neospectra NIR library, which contains more than 2,100 near-infrared (NIR) spectra of soils from the US, Ghana, Kenya, and Nigeria, measured using the NeoSpectra Handheld NIR Analyzer developed by Si-Ware. In total, the OSSL now includes more than 135,000 observations spanning (incompletely) the visible-and-near-infrared (Vis, 350-2500 nm), near-infrared (1350-2550 nm), and the MIR (600-4000 cm-1) ranges, with different sample sizes. We also improved the selection of target soil properties and almost 50 distinct soil properties are included in the OSSL version 1.2. The OSSL is publicly available (https://soilspectroscopy.github.io/ossl-manual/database-access.html) on Google Cloud Storage, application programming interface (API), MongoDB, and via a web service called OSSL Explorer (https://explorer.soilspectroscopy.org/). The second major update in the OSSL was the calibration of generally-applicable prediction models that can be used by any user in the world. We simplified the model types from the first version by relying on separate spectral ranges (not fusing VisNIR, NIR and MIR) and not using geocovariates as ancillary predictive information. This was done because spatial coordinates and sampling dates are need to properly use geolocated models, and soil spectroscopy users might not have this information in many situations. By setting two base models (using the KSSL library alone or the full OSSL), combined with the selection of soil properties and spectral range of interest - depending on the availability of complete information for both - we now publicly released 144 pretrained models via Google Cloud Storage, API, and the web service called OSSL Engine (https://engine.soilspectroscopy.org/). The predictive models were fitted with the MLR3 framework within the R statistical programming language, specifically using the Cubist machine learning algorithm. The final modeling configuration was implemented after running several internal experiments, benchmarking models and preprocessing methods. Another innovative part of this recent model update is that we employed the conformal prediction method for deriving uncertainty intervals. Additionally, the outputs from the OSSL models not only give the response and uncertainty prediction but also a flag if the spectra to be predicted is represented by the feature space from the calibration set. All these improvements were made to increase the trustworthiness of the OSSL models to users. Lastly, it is important to mention that all the reproducible code is available on GitHub (https://github.com/soilspectroscopy). We also updated the OSSL Manual (https://soilspectroscopy.github.io/ossl-manual/) with the new dataset contributions, database description, access options, new predictive modeling workflow, and web applications. An overview of the OSSL was recently made via webinar with the recording publicly available online (https://av.tib.eu/series/1078/soil+spectroscopy+for+the+global+good). To wrap up this big project, we are about to submit a manuscript to Plos One describing all the technical details about the OSSL and the modeling workflow. Objective 3 - The ring trial experiment was completed and onepaper submitted for publication. To recap, we sent out a standard set of 70 samples to over 35 labs around the world to be scanned in both the mid and visible/near infrared regions. These spectra were then used to assess instrument-to-instrument variability and to figure out the best pre-processing and modeling methods to achieve high quality predicts on as many instruments as possible. The big finding of this experiment is that, in the mid-infrared region, all instruments produce internally high quality predictions but not all instruments are compatible with existing soil spectral libraries, as hosted in the OSSL. For highly contrasting instruments, calibration transfer functions, a mathemetical transformation of the spectra using a common set of scanned smaples,were needed to produce acceptable results. The vis-NIR spectra are being analyzed by network members in Germany. Objective 5 - detailed more below.

Publications

  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Sanderman, J., Gholizadeh, A., Pittaki?Chrysodonta, Z., Huang, J., Safanelli, J. L., & Ferguson, R. (2023). Transferability of a large mid?infrared soil spectral library between two FTIR spectrometers. Soil Science Society of America Journal, 87, 586-599. 10.1002/saj2.20513
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Sanderman, J., Smith, C., Safanelli, J. L., Morgan, C. L., Ackerson, J., Looker, N., Mathers, C., Keating, R., & Kumar, A. A. (2023). Diffuse reflectance mid-infrared spectroscopy is viable without fine milling. Soil Security, 100104. 10.1016/j.soilsec.2023.100104
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Bloom, D. E., Safanelli, J. L., Sanderman, J., Hengl, T., & Todd-Brown, K. E. (2022, December). Faster Machine Learning for MIR Soil Spectroscopy with Discrete Haar Wavelet Transform. In AGU Fall Meeting Abstracts (Vol. 2022, pp. B22I-1551).


Progress 09/01/21 to 08/31/22

Outputs
Target Audience:Our target audiences for this project remain: 1) researchers with current interests in soil spectroscopy; 2) soil testing and agronomy services, particularly in the countries without good soil testing laboratories; 3) agricultural and global change modeling communities which have identified lack of spatially comprehensive information on soil properties as key constraints. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Due to ongoing covid-related travel restrictions, virtual working group meetings continued through most of this reporting period. The focus of most of these working group meetings were on the calibration transfer and process modeling groups as described above. We hosted a third webinar by Preston Soreneson, titled "Mapping Soil Organic Carbon In Soil Profiles using Imaging Spectroscopy" (https://doi.org/10.5446/57308). The three educational webinars (https://av.tib.eu/series/1078/soil+spectroscopy+for+the+global+good) that we have hosted now have been viewed over 300 times. A bi-monthly reading group was initiated and is regularly attended by graduate students, research assistants and postdocs at four different institutions. ?The project is supporting the training and professional development of one postdoctoral researcher, Dr. Jose Lucas Sanfanelli based at Woodwell Climate, and a graduate student, Dellena Bloom at the University of Florida. How have the results been disseminated to communities of interest?This project continues to engage a broad audience through our website (soilspectroscopy.org) with educational materials, blog posts, newsletters and archive of project and other important literature on the topic of soil spectroscopy. The website has had >7500 visitors from 133 countries since its launch 1.5 years ago. Our twitter account (https://twitter.com/soilspec) has sent out 135 tweets on soil spectroscopy and currently has 480 followers. What do you plan to do during the next reporting period to accomplish the goals?The focus of year 3 will be on refinement of the Open Soil Spectral Library. An all-hands in-person meeting is scheduled for late October 2022 where we will finalize a major update to last year's release of the OSSL. Four new spectral libraries will be added. The estimation service will be updated to reflect the learnings from the calibration transfer working group and the research on data compression. We are planning for a full network workshop in the Spring of 2023 open to all CIN members. This workshop will be a combination of science presentations and intensive breakout sessions. Year 3 will also represent a shift towards outreach and dissemination with the intention of holding demonstrations of the OSSL at different soil conferences including the 2023 Tri-society meeting.

Impacts
What was accomplished under these goals? In this past year, we have completed version 1.0 of the Open Soil Spectral Library (Objective 1) developed a series of models for inferring a range of soil properties from soil spectra (Objective 2) that are hosted on a fully functioning web application (Objective 4). Several community research projects are underway to better understand transferability between instruments (Object 3). Education, outreach and use activities (Objective 5) have continued throughout this period. On World Soils Day (December 5, 2021), we released the first version of the Open Soil Spectral Library (https://www.woodwellclimate.org/open-soil-spectral-library/) which included both all data and code on Github and two Rshiny apps for exploring the data geographically (https://explorer.soilspectroscopy.org) and using standard models for generate predictions i.e. what we refer to as the OSSL Engine (https://engine.soilspectroscopy.org). Since the launch we have noticed increased interest in the project and thousands of downloads of the data. Current web-traffic shows global interest in the project and outputs: Key new issues implemented in reporting period: 3 additional SSLs (AfSIS, ISRIC-ICRAF, LUCAS) were added to OSSL so that it contains enough data for global modeling; Technical publication (OSSL manual) explaining how to access OSSL and containing all necessary metadata has been updated: https://soilspectroscopy.github.io/ossl-manual/ Global calibration models have been updated by focusing modeling on 1st derivative instead on the raw data; newly calibrated models are available at: https://github.com/soilspectroscopy/ossl-models/tree/main/R-mlr (accuracy of the models v1.1 is reported based on the 5-fold cross-validation) In order for any potential user to take advantage of the OSSL estimation service, we must have confidence that the uploaded spectra are compatible with the spectra hosted in the OSSL itself (Objective 3). To this end, we organized a soil spectroscopy ring trial, where a set of 70 diverse soil samples have been sent to 25 laboratories in the US and around the world. All labs have scanned these samples under their own operating conditions. The results are currently being compiled and will form the basis for making recommendations as to the need for standardization and the best ways to achieve standardization. Expected completion of data analysis and reporting on the ring trial results is Jan 2023. In a separate case study, we explored the question of optimal calibration transfer set size using different calibration transfer algorithms between the USDA NRCS Kellogg Soil Survey Lab MIR library and samples scanned on a secondary instrument. This research culminated in a submission to the Soil Science Society of America Journal which is currently under review. Another bottleneck in developing a fast and accurate estimation service is the size of the data. Mid infrared spectra are recorded at fine resolution resulting in 2000+ data points per spectra. When working with 100,000+ spectra, this makes for a very large data frame. In support of Objective 2, we have been testing several data compression techniques including principal components analysis and Haar wavelet analysis. Both methods appear promising in retaining useful information but reducing data size by several orders of magnitude. Our goal with data compression is to be able to move from precalculated models, as currently hosted in version 1.0 of the OSSL, to local instance-based models. A small working group has been active over this reporting period developing a use case for soil spectroscopy for carbon cycle modeling (Objective 5). The Soil Spectroscopy for Process Models group is building on earlier work by PI Sanderman to estimate fractions of soil organic carbon including pyrogenic carbon and then using these estimates of carbon fractions to help initial process-based earth system models to understand carbon trajectories under different climate and land use scenarios.?

Publications

  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Shepherd KD, Ferguson R, Hoover D, van Egmond F, Sanderman J, Ge Y (2022) A global soil spectral calibration library and estimation service. Soil Security, 7, 100061. https://doi.org/10.1016/j.soisec.2022.100061
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Garrett LG, Sanderman J, Palmer DJ, Dean F, Jeram S, Bridson JH, Carlin T (2022) Mid-infrared spectroscopy for planted forest soil and foliage nutrition predictions, New Zealand case study. Trees, Forests and People, 100280. https://doi.org/10.1016/j.tfp.2022.100280
  • Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Van Egmond F, Ferguson R, Shepherd KD, Peng Y, Sanderman J, Caon L, Hoover D (2021) A global soil spectral calibration library and estimation service. ASA-CSSA-SSSA International Annual Meeting. November 2021
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Shepherd KD et al. (2022) A global soil spectral calibration library and estimation service. World Congress of Soil Science. August 2022.
  • Type: Websites Status: Published Year Published: 2021 Citation: https://explorer.soilspectroscopy.org/
  • Type: Websites Status: Published Year Published: 2021 Citation: https://engine.soilspectroscopy.org/
  • Type: Other Status: Published Year Published: 2021 Citation: The Open Soil Spectral Library Manual (2021) https://soilspectroscopy.github.io/ossl-manual/


Progress 09/01/20 to 08/31/21

Outputs
Target Audience:There are several target audiences for this project: 1) researchers with current interests in soil spectroscopy who will come together to form the CIN network; 2) soil testing and agronomy services, particularly in the countries without good soil testing laboratories; 3) agricultural and global change modeling communities which have identified lack of spatially comprehensive information on soil properties as key constraints. In this first year of this project, our primary target audience has been the research and soil testing lab communities that comprise the network.? Changes/Problems:Covid related remote work has definitely slowed the first year of this project. The original plan hadsignificant emphasis on intense in-person workshops and coding hackathons. Given the inability to travel in the first year, we shifted some of the participant support budget towards having a greater role for our technical implementation collaborator at OpenGeoHub. This has allowed OpenGeoHub to contract with a front-end/back-end group to develop the R-Shiny interface for the OSSL. Another slowdown in the project was that Woodwell Climate lost the postdoc that was hired to work directly with PI Sanderman. We have identified a new scientist to fill this role starting in January 2022but there will have been a nearly 6 month gap in full-time postdoctoral support on this project. All co-PIs have stepped up to maintain network communications in the absence of this coordinating role. What opportunities for training and professional development has the project provided?Despite covid not allowing the network to gather in person, this project has hosted 11 virtual meetings covering different topics in soil spectroscopy. Between 9 and 32 people attended these virtual meetings. The project has hosted two educational webinars on important topics in soil spectroscopy (Wadoux, A. 2021: "Estimating soil properties with spectral data",https://doi.org/10.5446/52954 andRamirez-Lopez, L. 2021: "Getting accurate predictions from large and diverse spectral libraries",https://doi.org/10.5446/52955) with recordings of both webinars available through our website. These recordings have been views a total of 116 times. How have the results been disseminated to communities of interest?Several peer reviewed publications have resulted from research activities in the first year of this project primarily focused on the topic of calibration transfer (Objective #3). PI Sanderman has given virtual presentations at the 2020 Tri-society Meeting and the American Geophysical Union Fall Meeting focusing on the overall goals of this network project. PI Sanderman also presented an invited talk on emerging approaches for measuring soil carbon as part of a USDA Climate Hub / AGU webinar series featuring research from this project which has been viewed over 740 times (https://www.youtube.com/watch?v=vrtru5wzwgQ&t=2s). Four newsletters have gone out to network members and to the broader soil spectroscopy community via our website. The project has established a website (soilspectroscopy.org) which serves as both an educational platform and a means for disseminating project updates and findings. The website has recieved >1800 visitors this calendar year. The project also maintains a twitter handle (@soilspec) which disseminates both project findings, project updates and serves as a resource of new literature in soil spectroscopy. The twitter account has sent out 75 tweets and currently has 230 followers. What do you plan to do during the next reporting period to accomplish the goals?The focus of year 2 will be on completion and launch of the Open Soil Spectroscopy Library Explorer and Estimation Service. The OSSL Explorer is almost ready for public release and efforts of the network will shift to developing optimal calibration models for working with this database. The interlaboratory ring trial will be completed and analyzed to develop optimal transfer methods that will enable many more labs to take advantage of the large spectral libraries that are included in the OSSL. Hopefully, as the covid situation improves we will be able to host our first in-person meetings and workshops which will accelerate the pace of progress towards acheiving these goals.

Impacts
What was accomplished under these goals? The primary objective of this project is to share knowledge and build a community of practice with a common goal of increasing the availability of soil data for numerous applications. The project has built a network consisting of >100 current or potential soil spectroscopy users. A series of virtual meetings have laid the foundation for the development of four working groups that are tasked with progressing and delivering on the five objectives listed for this project. In addition to network meetings, three virtual presentations were given and two webinars were organized to share best practices in soil spectroscopy. A website (soilspectroscopy.org) was built to facilitate networking and to disseminate knowledge on soil spectroscopy. The largest efforts this past year has been in designing the data infrastructure to host spectral libraries and a spectral estimation service (Objective 4) and in developing the Open Soil Spectral Library (Objective 1). Good progress was also made on understanding the transferability of calibration models (Objective 3). Progress on Objective 1: We have invested a significant amount of time on developing and implementing import and re-formatting functions to enable easy and universal import of the international soil spectral datasets. We have started with the NRCS KSSL dataset which was originally delivered as a Microsoft Access DB, with spectral reflectances (MIR and VisNIR) as separate files. We have created R code (R markdown) to import the data, clean up and check values, and then bind all data into a universal format with soil laboratory data, soil site data, MIR and VisNIR in total of 4 tables. Using standardized columns names (listed at: https://soilspectroscopy.github.io/ossl-manual/database.html#database) and structures was crucial to allow for easy binding of datasets coming from different projects (e.g. AfSIS, LUCAS, ICRAF-ISRIC and similar). Once the data has been imported and bind using standardized columns names (we call the compiled dataset "Open Soil Spectral Library"), it was then imported into a mongoDB installed on a dedicated server and allowing for thousands of users to access it at the same time (including the developers through the R mongoDB package and the REST API). In addition, the OSSL data set was made available via a simple RShiny-based viewer which is available at http://explorer.soilspectroscopy.org:8100/. In the coming months we plan to release the first version of the OSSL together with a manual and an R package allowing for easy access to the data (as a model we use: "rgbif", "openair" and "GSODR" packages in R). Together with the launch of OSSL we will campaign / invite international and national organizations to contribute at least part of the SSL's into the OSSL and help build open tools for a global good. Import, clean-up and standardization of the KSSL library has been quite cumbersome with Microsoft DB consisting of hundreds of tables which are not all used. Thanks to the kind assistance of the NRCS staff, we have finally managed to import and organize the most up-to-date version of the KSSL and add all missing columns such as the Open Location Code, geographical coordinates etc. The remaining open soil spectral datasets such as AfSIS, ICRAF-ISRIC and LUCAS will be gradually imported and added to the OSSL, so that the total library should eventually contain over 200,000 samples and should cover almost 4 continents making it applicable for global modeling. Each dataset takes a lot of time to understand and import because most use different standards and naming conventions. Progress on Objective 2: While not a main focus of year 1, we have tested building models in R and then using these models in an REST API framework to serve predictions. Such web-services will be further used to build interfaces / semi-automated frameworks to calibrate soil spectral signatures to hundreds of users at the same time. In addition, we have tested using AWS Simple Cloud Storage (S3) services for sharing calibration models (exported as RDS files and registered with a DOI), so that the community can share their models without a need to share all calibration data. It is very possible that there will be more contributions to the models than to the OSSL, especially by organizations that neither have permission nor capacity to share original raw data. An Rshiny based interface to allow calibration of raw spectral scans is currently being developed. Progress on Objective 3: In order for a new spectroscopy lab to take advantage of existing spectral libraries, there needs to be confidence that the spectra produced in that new lab would be the same as if produced in the primary lab that developed the spectral library. This project continues to test different methods of spectral harmonization and calibration transfer through experiments with exchange of samples between laboratories. In the visible-near infrared spectrum, Gholizadeh et al. (2021) demonstrated that the use of an internal soil standard could align VNIR spectra from four different labs. Dangal and Sanderman (2021) demonstrated that piecewise direct standardization could align a secondary MIR instrument with the USDA KSSL MIR library. In Pittaki-Chrysodonta et al. (2021), we tested different calibration transfer algorithms for using the USDA KSSL library on a lower cost MIR spectrophotometer and found that the spectral space transformation was the best and most efficient (in terms of requiring the minimal number of transfer samples) approach. Currently, a large inter-laboratory ring trial is in its final phases of preparation. A common set of 70 soil samples will be sent to 20 different spectroscopy labs that are part of the network. Progress on Objective 4: The database is being hosted on a dedicated server used only for the SoilSpec4GG project (at the moment: dedicated server with 64-GB RAM and 16t and 2TB SSD). This infrastructure should be enough to have a few hundred users accessing the data in parallel. We have installed everything using Docker, so that it could easily be scaled up / migrated to cloud in the case the interest for the OSSL starts increasing. OSSL, REST API and R package should eventually become a fully-fledged solution for users to: register, upload their spectra, select soil properties, and receive predictions of the OSSL soil properties including the uncertainty / receive an analysis report, upload and share soil spectroscopy models using a DOI (so they can get credit for their work), develop applications on the top of the OSSL (using the open API). The REST API will be released at the same time as the OSSL explorer. The MongoDB is currently available only for the project partners and will need to be protected / secured before we can expose it publicly (e.g. through authentication and similar) otherwise malicious users could potentially block or overload the services.

Publications

  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Sanderman, J., Savage, K., Dangal, S.R.S., Duran, G., et al. (2021) Can agricultural management induced changes in soil organic carbon be detected using mid-infrared Spectroscopy? Remote Sensing, 13, 2265. 10.3390/rs13122265
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Gholizadeh, A., Neumann, C., Chabrillat, S., van Wesemael, B., et al. (2021). Soil organic carbon estimation using VNIR-SWIR spectroscopy: The effect of multiple sensors and scanning condition. Soil and Tillage Research, 211, 105017. 10.1016/j.still.2021.105017
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Pittaki-Chrysodonta, Z., Hartemink, A., Sanderman, J., Ge, Y., Huang, J. (2021). Evaluation three calibration transfer methods for predictions of soil properties using mid-infrared spectroscopy. Soil Science Society of America Journal. 10.1002/saj2.20225
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Dangal, S.R.S., Sanderman, J. (2020). Is standardization necessary for sharing a large mid-infrared soil spectral library? Sensors, 20, 6729. 10.3390/s20236729
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Sanderman, J., Todd-Brown, K.E., Hengl, T., Dangal, S.R.S., et al. (2020). Spectroscopy to fill the soil data gap. ASA-CSSA-SSSA International Annual Meeting. November 2020. https://scisoc.confex.com/scisoc/2020am/prelim.cgi.Paper.131585
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Sanderman, J., Dangal, S.R.S., Todd-Brown, K.E., Hengl, T., et al. (2020). Filling the soil data gap. American Geophysical Union Fall Meeting. December 2020. http://agu.confex.com/agu/fm20/meetingapp.cgi/Paper/713137
  • Type: Websites Status: Published Year Published: 2020 Citation: soilspectroscopy.org
  • Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Sanderman, J. 2021 Emerging measurement technologies for soil carbon. ISCN / AGU / USDA Climate Hub Webinar series. https://www.youtube.com/watch?v=vrtru5wzwgQ&t=1885s