Source: UNIV OF HAWAII submitted to NRP
DSFAS: SOIL HEALTH FINGERPRINTING: RAPIDLY PREDICTING SOIL HEALTH IN A DIVERSITY OF SOILS USING MACHINE LEARNING
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030595
Grant No.
2023-67021-40005
Cumulative Award Amt.
$649,570.00
Proposal No.
2022-11592
Multistate No.
(N/A)
Project Start Date
Sep 15, 2023
Project End Date
Sep 14, 2028
Grant Year
2023
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
UNIV OF HAWAII
3190 MAILE WAY
HONOLULU,HI 96822
Performing Department
(N/A)
Non Technical Summary
Our team recently created atool that provides robust soil health assessments to help land stewards work towards their sustainability goals and support healthy communities. However, our current suite of soil health indicators are costly and time-consuming to measure. While we also havenovel dataresources such as soil spectra and microbiomesat our fingertips, these data have yet to be integrated into our soil health database, predictions, or education. Therefore, wepropose to identify novel data resources that allow for more affordable, rapid, and comprehensive soil health testing in direct response to the Data Science for Food and Agriculture Systems call to "synthesize or analyze existing data and resources on soil health." We will first integrate these novel data streams into our current soil health database, and thentest whether these new data resources can help usaccurately predict soil health using machine and deep learning models.Finally, we willtrain undergraduate and graduate students through formalized data science internships, coursework, and assistantships. Through thisproject, we help to develop a deeper understanding of the relationship between the soilfingerprint (i.e., microbiome and spectroscopy) and its health status, an improved tool to effectively measure soil health, and the development of a skilled workforce. Ourcombined expertise creates a unique opportunity to leverage existing resources and make advancements in research and education that will benefit students, land stewards, the next generation of researchers, and the community at large.
Animal Health Component
33%
Research Effort Categories
Basic
34%
Applied
33%
Developmental
33%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
10201101070100%
Knowledge Area
102 - Soil, Plant, Water, Nutrient Relationships;

Subject Of Investigation
0110 - Soil;

Field Of Science
1070 - Ecology;
Goals / Objectives
Our first goal is to harmonize existing data from several ongoing, interconnected research efforts in order to develop machine learning models that predict the health of a diversity of soil types-including but not limited to tropical and volcanic soils. We will first integrate three unique data streams into an innovative database, which includes conventional soil health indicators, mid-infrared spectroscopy, and soil microbiomes. We will then explore supervised machine learning techniques and deep learning models to predict soil health based on the high-dimensional data. These predictive models will become available and accessible through their integration into our existing user-based Hawaii Soil Health Webtool platform that supports monitoring efforts and adaptive management for the aggradation of soils and climate ready landscapes.Our second goal is to build up the capacity for place-appropriate data science. Given that community action is needed to protect, restore, and improve landscape health and resilience, we rely on the next generation of professionals to prioritize soil health through the implementation of evidence-based and equitable solutions. We will train undergraduate and graduate students through formalized internships, coursework, and assistantships in order to render the skills needed to monitor soil health and soil-related ecosystem services using advanced data science techniques. However, training and monitoring efforts must be culturally appropriate. Therefore, we will build in ethics training associated with indigenous ways of being into all student training activities to support the emerging place-appropriate data science practices. We propose three supporting objectives to help achieve these goals and outcomes:To Harmonize Soil Health Data Streams - Engineer existing data streams to facilitate artificial intelligence and deep learning.To Develop Deep Learning Models - Predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments.To Build Pathways that Integrate Data Science into the Classroom and Beyond - Train undergraduate and graduate students in data science and data ethics through classroom instruction, internship opportunities, and assistantships.
Project Methods
A deeper understanding of the relationship between the soil fingerprint (microbiome and spectroscopy) and its health status will enable more routine and cost-effective soil health testingto help elucidatemanagement systems that enhance soil health, food production, and environmental outcomes. To achieve this, we will (i)apply our recently-released soil health scoring function to all incoming samples into the Hawaii Soil Health Webtool database to produce soil health scores and classes, (ii) apply mid-infrared spectra based machine learning algorithms (i.e., artificial neural networks, random forest and memory-based learning) to predict soil health indicators and soil health scores, (iii) utilize measures of microbiome community composition and topological characteristics of cross-domain networks to inform machine learning models, and (iv) to assess the ability of different machine learning algorithms to classify soil health based on these new high-resolution data streams. We will evaluate our progress by the achievement of specific milestones, including a soil health database with up-to-date soil health scores (year 1), a harmonized database with high resolution datastreams (i.e., spectroscopy and microbiome) (year 2-3),the evaluation of machine learning algorithms to predict soil health from multiple data streams(year 3-5), the matriculation of two graduate students (year 4-5), and the publication of 5 papers (year 3-5+).We also aim to help students build the skills neededto become competitive in the workforce by using novel methods to solve problems indata-driven agriculture and conservation.Students will also receiveexplicit training in data ethics to guide the application of machine learning and work towards ethical and equitable soil health assessments. Our methods will include formal classroom instruction, development of curriculum or innovative teaching methodologies, internships, and extension and outreach. Wewill evaluate educational activities and track metrics relevant to our objectives with IRB approval as needed. To assess performance, we will utilize a survey instrument developed by our research groupthat includes a set of questions assessing their interest and knowledge levels before and post-training, level of confidence in subject ability, and experience with course elements (e.g., group project). Specifically, the survey will address soil health tools and ethical concepts in data science, including soil health indicators, proxy measurements, database curation, machine learning, Native Hawaiian history, and data ethics. We will also seek student evaluations of education materials. For TPSS 333, we will ask students to evaluate our course through the Official University of Hawaii Course and Faculty Evaluation System (eCAFE), which is an entirely online evaluation system. For the internships, we will adapt and disseminate our current Google Form evaluation of our ongoing USDA REEU program. Finally, we will consider workforce placement of students involved in the project. We will consult two large potential local employers for their input on the quality of skills the students acquire as part of our advisory group (as part of ourManagement Plan). We will also work with our undergraduate program to track the placement of students enrolled in our program. We will apply these methods in Year 3-5 of the project.

Progress 09/15/23 to 09/14/24

Outputs
Target Audience:Partnership for Climate-Smart Commodities: As an integral part of the Hawai'i Partnership for Climate Smart Commodities, our data science team interacts with the following Producer Engagement Teams who represent various stakeholder and commodity groups on the islands: (1) Hawai'i Cattlemen's Council, (2) O'ahu Resource Conservation and Development Council, (3) O'ahu Agriculture and Conservation Association, (4) Office of Indigenous Knowledge and Innovation, (5) Forest Solutions, (6) The Kohala Center, and (7) Hawai'i Farms Union United. These organizations are the liaison groups representing collaborating growers and land managers. Our team is active participants for the following committees of the Partnership: Producer Impact, Standard Operating Procedures, Data, MMRV, and USDA Data Reporting. Starting in May 2023, our team joins multiple meetings per week. Producers: During the last reporting period, our data science team also supported the USDA funded soil health farmer cohort program led by O'ahu Resource and Conservation Development. We provided 7 one-on-one sessions to participating growers on interpreting their soil health testing reports from Jan 24 to 31, 2024. We also engaged with a local farm (name withheld for privacy concerns) on a soil fertility and soil health educational project to develop management plans using project data, which were presented to the farmer on May 6, 2024. Government: Our data science team also met with federal government agencies to provide an overview of our soil health tools. This includes meetings with the USDA NRCS local, regional, and national offices and ARS. Two meetings took place during this project reporting period on Sept 14, 2023, and June 20, 2024. We also reached out to an ARS unit focused on soil health measurements during this reporting period, and we scheduled a meeting for October 22, 2024. Non-profits: Our data science team has directly engaged and partnered with two community-based organizations (names withheld for privacy concerns). We held meetings directly focused on data science policy, needs, and management on Feb 16, 2023, Oct 13, 2023, and Oct 29, 2024. Agribusiness: We have also engaged directly with commercial entities. The first is a soil testing start-up (name withheld for privacy concerns), which is interested in using our soil health scoring function applied to FTIR measurements. We have met with the university about the commercialization potential of our tool throughout the reporting period. We are also in the process of transitioning to host our soil health webtool with a local secure, multiplatform, location-based software company. This platform is popular with producers, and it will allow us to interact, accept soil samples, and report results with producers. Researchers and academics: Our team interacted and presented our first-year data science findings at the 2023 Soil Science of America Conference (one oral and one poster presentation) and 11th BIOGEOMON International Symposium on Ecosystem Behavior (one poster presentation). We also submitted 5 abstracts to the 2024 Soil Science of America Conference, and our work was featured in a keynote talk by co-PI Susan Crow. The project PI (Maaz) also attended the NIFA Project Directors Meeting in Manhattan, KS, and was invited to present the soil health data science tools as part of the Climate-Smart Partnership Seminar Series and as a panelist for "CTAHR Introduces New Tools and Programs to Help Farmers" at the Hawai'i Ag2024 Conference. Changes/Problems:No major changes to the approach or challenges. The work has been surprisingly smooth and rewarding. However, one upcoming concern is the timing of the data science internship component of this grant. In working with the Information and Computer Sciences department, hosting interns throughout the year may be more desirable than during the summer. However, we requested summer internships. And so, we might need to request formal permission to host undergraduate interns throughout the year. What opportunities for training and professional development has the project provided?Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. As part of this, we recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Since January 2024, our data science group institute a weekly writing workshop to work on manuscripts, proposals, and theses. The paper that we submitted as part of this grant came from this workshop. In total, students of the workshop have generated two manuscripts and one thesis. We now have four papers and two proposals in preparation. As mentioned in the accomplishments section, we launched TPSS 333, which is a course designed for natural science students to gain data science skills. We modeled this course off of our previous data science internship program (NIFA workforce development grant), for which we hosted our final cohort during this reporting period. This grant and program is supporting the professional development of postdocs and graduate students, and supported the following conference submissions to the SSSA meeting after only one year of the project: Slanzon, G., McClellan Maaz, T., Crow, S. E., Deenik, J. L., Kantar, M., & Nguyen, N. (2024) Connecting Fungal-Bacterial Interactions and Soil Health in Tropical Soils [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/158436 Shor, H., Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Unearthing Trends: Optimizing Tropical Soil Health Scoring Methods [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/160102 Estrada, K., McClellan Maaz, T., Beckstrom, T. B., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Tallamy Glazer, C. J., Satdichanh, M., Lawrence, D., Rivera Zayas, J., Ticktin, T., Sotomayor, D., & Crow, S. E. (2024) The Dirt on Disturbance: Creating an Anthropogenic Disturbance Index to Objectively Capture Overlap across and within Land Use Types for Tropical and Subtropical Agroecosystems [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159553 Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Assessing (sub)Tropical Soil Health Along a Disturbance Gradient in Hawai'i, Puerto Rico, and Pohnpei [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159558 Kiehl, K. D., Fullmer, C., Nguyen, N., McClellan Maaz, T., Crow, S. E., & Deenik, J. L. (2024) A Microbial Proxy for Mineralizable Nitrogen [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/162330 How have the results been disseminated to communities of interest?Below are ways in which data are disseminated to the following audiences. Partnership for Climate-Smart Commodities: We have disseminated results via a seminar series and various biweekly meetings. Producers: We have disseminated results to producers via seven one-on-one training and by generating dozens and dozens of soil health reports to provide soil health scores.We are actively working with the new software company to generate reports via their reporting platform. Government: We have disseminated results via three oral presentations and three 2-pager briefings. Non-profits: We have disseminated results via two oral presentations with accompanying printed handouts. Agribusiness: We have not disseminated results to agribusiness. Researchers and academics: We have disseminated results at two conferences and one meeting via poster and oral presentations. We have submitted one publication for peer-review. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: To Harmonize Soil Health Data Streams - This objective is largely completed. However, we aim to continue data processing for soil samples entering our three data streams. We will have a public release of Shiny App for soil health scoring during the next reporting period. Objective 2: To predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments - This objective's activities will be the focus of our next reporting period. As our machine learning efforts progress, we will add additional models and pipelines to our repository, including deep learning scripts for soil health prediction. We aim to make sections of the repository public by the next reporting period to share reproducible pipelines with the wider scientific community and stakeholders. Objective 3: To Build Pathways that Integrate Data Science into the Classroom and Beyond - We will continue training postdocs, graduate students, and undergraduates. We will launch our first undergraduate internship experience. In Spring 2025, we will also collaborate with professors across campus (one in sustainability, one in hydrology, one in oceanography, one in plant science, and one in natural resource management) to develop case studies for the data ethics lesson.

Impacts
What was accomplished under these goals? 1. To Harmonize Soil Health Data Streams - Engineer existing data streams to facilitate artificial intelligence and deep learning. Our team was able to successfully harmonize all soil health data streams during this first reporting period, which currently exist in the correct data structures and locations. We now have three databases as described in the first three activities of Objective 1, and are actively in the process of constructing a relational database (final activity of Objective 1), as described below. Objective 1, Activity 1: In summary, we have tripled our soil health database since this grant was submitted, and now have over 1500 samples with soil health indicator data and scores. During this reporting period, we also tested three iterations of the soil health scoring function, which we found to be highly reliable and stable. We presented the results of the second iteration at a conference (McClellan Maaz, T., Crow, S. E., Deenik, J. L., Loo, K., Tallamy-Glazer, C. J., & Beckstrom, T. B. 2023. Measuring the Immeasurable: A Structural Equation Modeling Approach to Conceptualizing and Scaling up Soil Health Assessments [Abstract]. ASA, CSSA, SSSA International Annual Meeting, St. Louis, MO.). We are developing a manuscript outlining the results of the third iteration. Objective 1, Activity 2: We also finished constructing the microbiome pipeline and processed data for fungal ITS (500 samples) and bacterial and archaeal 16S (703 samples). Objective 1, Activity 3: We finalized the FTIR pipeline and created a machine learning algorithm to relate spectral data to soil health indicator values and scores. Together with Activity 2, we fully implemented and documented microbiome and FTIR pipelines for analyzing soil samples. These workflows are accessible and executable on High-Performance Computing (HPC) clusters using the portable GitHub structure. Objective 1, Activity 4: During this process, we created scripts to join microbiome and FTIR data streams using unique soil sample identifiers. Harmonized data now integrates multiple sources, preparing it for machine learning model training. To accomplish this task, we performed extensive data cleaning to ensure that the soil ID was accurate and accessible. We are currently working on building a relational database to organize, store, and manage this large and complex soil health dataset including microbiome, FTIR spectral data, soil indicator data, and scores. Furthermore, we are in the process of refining the controlled vocabulary used for collecting the metadata associated with the soil health tool, and aligning this vocabulary as we transition to hosting the tool on the new software platform. During this last reporting period, we have spent a lot of time establishing a collaborative remote repository for sharing and maintaining reproducible workflows for bioinformatics, soil health scoring, and machine learning model development. We organized the repository structure with clear folder hierarchies for each data stream (microbiome, FTIR, soil health indicators). All scripts utilize relative file paths to ensure portability and flexibility for different users and computational environments. README.md files in each project directory ensure that users can easily follow setup instructions, with dependencies and versions documented using requirements.txt and environment.yml files for Python and Conda environments, respectively. Markdown (.md) files and Jupyter notebooks document each step of the data processing and model development pipelines, including visualization, error reporting, and parameter tuning. The repository tracks all changes to scripts and workflows, ensuring that team members can easily collaborate, roll back changes, and experiment with new methods. Beginning in September 2024, we began building a prototype of the Shiny app for soil health scoring. It is currently under development andis maintained within the GitHub repository. This app will serve as a tool for fellow scientists and growers to evaluate soil health in real-time. 2. To Develop Deep Learning Models - Predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments. During this last reporting period, we have begun exploring this objective using our FTIR dataset. We submitted our first publication for the grant that reports the development of a machine learning algorithm (Gaussian Processing) which is currently under review for (Tanner B. Beckstrom, Arianna Bunnell, Tai M. Maaz, Michael B. Kantar, Jonathan L. Deenik, Christine Tallamy Glazer, Peter Sadowski, Susan E. Crow. In review. Mid-infrared Spectroscopy and Machine Learning Improve Accessibility of Hawai'i Soil Health Assessment. SSSAJ.) This work will be highlighted in the keynote address of the 2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX, by co-PI Susan Crow, S. E. (Rebuilding Health, Resilience, and Equity in Hawai'i's Agroecosystems, https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/156048). We have recruited and hired one PhD student to work on this objective, and we have built a relationship with faculty in the Department of Information and Computer Sciences. Given that this objective is contingent upon the previous objective, we will conduct most of this work in upcoming cycles. 3. To Build Pathways that Integrate Data Science into the Classroom and Beyond - Train undergraduate and graduate students in data science and data ethics through classroom instruction, internship opportunities, and assistantships. We initially focused a lot of time to adopt a data policy and technical summary of our soil health database before recruiting students and developing ethics training. We adopted the data policy and technical summary on April 25, 2024. Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. We recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Students and postdocs working on this project have been trained to contribute to the repository, making them proficient in version control, reproducibility, and open science practices. This repository serves as a training ground for developing place-appropriate data science practices. In Fall 2024, we also launched TPSS 333. The purpose of this course is to prepare students to learn and implement principles of data analysis, visualization, and mapping. Our targeted audience is students (in agriculture, environmental science, geography, sustainability studies and beyond) interested in developing a critical framework for understanding different ways of measuring, visualizing, and interpreting the different goals humans have for agroecosystems. This course is focused on the recent advances in analytical techniques which have allowed anunprecedented way of exploring interdependence of our natural and human-mediated systems. As part of the course, we piloted an ethics training which utilizes a case studies approach first featuring research on kalo (taro) and indigenous data sovereignty. Utilizing our soil health database, we have also created a physical soil library space for students and researchers to access soil archetypes which include unique combinations of soil mineralogy, current land management, and past land use. This library will support future research efforts.

Publications