Source: UNIV OF HAWAII submitted to NRP
DSFAS: SOIL HEALTH FINGERPRINTING: RAPIDLY PREDICTING SOIL HEALTH IN A DIVERSITY OF SOILS USING MACHINE LEARNING
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030595
Grant No.
2023-67021-40005
Cumulative Award Amt.
$649,570.00
Proposal No.
2022-11592
Multistate No.
(N/A)
Project Start Date
Sep 15, 2023
Project End Date
Sep 14, 2028
Grant Year
2023
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
UNIV OF HAWAII
3190 MAILE WAY
HONOLULU,HI 96822
Performing Department
(N/A)
Non Technical Summary
Our team recently created atool that provides robust soil health assessments to help land stewards work towards their sustainability goals and support healthy communities. However, our current suite of soil health indicators are costly and time-consuming to measure. While we also havenovel dataresources such as soil spectra and microbiomesat our fingertips, these data have yet to be integrated into our soil health database, predictions, or education. Therefore, wepropose to identify novel data resources that allow for more affordable, rapid, and comprehensive soil health testing in direct response to the Data Science for Food and Agriculture Systems call to "synthesize or analyze existing data and resources on soil health." We will first integrate these novel data streams into our current soil health database, and thentest whether these new data resources can help usaccurately predict soil health using machine and deep learning models.Finally, we willtrain undergraduate and graduate students through formalized data science internships, coursework, and assistantships. Through thisproject, we help to develop a deeper understanding of the relationship between the soilfingerprint (i.e., microbiome and spectroscopy) and its health status, an improved tool to effectively measure soil health, and the development of a skilled workforce. Ourcombined expertise creates a unique opportunity to leverage existing resources and make advancements in research and education that will benefit students, land stewards, the next generation of researchers, and the community at large.
Animal Health Component
33%
Research Effort Categories
Basic
34%
Applied
33%
Developmental
33%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
10201101070100%
Knowledge Area
102 - Soil, Plant, Water, Nutrient Relationships;

Subject Of Investigation
0110 - Soil;

Field Of Science
1070 - Ecology;
Goals / Objectives
Our first goal is to harmonize existing data from several ongoing, interconnected research efforts in order to develop machine learning models that predict the health of a diversity of soil types-including but not limited to tropical and volcanic soils. We will first integrate three unique data streams into an innovative database, which includes conventional soil health indicators, mid-infrared spectroscopy, and soil microbiomes. We will then explore supervised machine learning techniques and deep learning models to predict soil health based on the high-dimensional data. These predictive models will become available and accessible through their integration into our existing user-based Hawaii Soil Health Webtool platform that supports monitoring efforts and adaptive management for the aggradation of soils and climate ready landscapes.Our second goal is to build up the capacity for place-appropriate data science. Given that community action is needed to protect, restore, and improve landscape health and resilience, we rely on the next generation of professionals to prioritize soil health through the implementation of evidence-based and equitable solutions. We will train undergraduate and graduate students through formalized internships, coursework, and assistantships in order to render the skills needed to monitor soil health and soil-related ecosystem services using advanced data science techniques. However, training and monitoring efforts must be culturally appropriate. Therefore, we will build in ethics training associated with indigenous ways of being into all student training activities to support the emerging place-appropriate data science practices. We propose three supporting objectives to help achieve these goals and outcomes:To Harmonize Soil Health Data Streams - Engineer existing data streams to facilitate artificial intelligence and deep learning.To Develop Deep Learning Models - Predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments.To Build Pathways that Integrate Data Science into the Classroom and Beyond - Train undergraduate and graduate students in data science and data ethics through classroom instruction, internship opportunities, and assistantships.
Project Methods
A deeper understanding of the relationship between the soil fingerprint (microbiome and spectroscopy) and its health status will enable more routine and cost-effective soil health testingto help elucidatemanagement systems that enhance soil health, food production, and environmental outcomes. To achieve this, we will (i)apply our recently-released soil health scoring function to all incoming samples into the Hawaii Soil Health Webtool database to produce soil health scores and classes, (ii) apply mid-infrared spectra based machine learning algorithms (i.e., artificial neural networks, random forest and memory-based learning) to predict soil health indicators and soil health scores, (iii) utilize measures of microbiome community composition and topological characteristics of cross-domain networks to inform machine learning models, and (iv) to assess the ability of different machine learning algorithms to classify soil health based on these new high-resolution data streams. We will evaluate our progress by the achievement of specific milestones, including a soil health database with up-to-date soil health scores (year 1), a harmonized database with high resolution datastreams (i.e., spectroscopy and microbiome) (year 2-3),the evaluation of machine learning algorithms to predict soil health from multiple data streams(year 3-5), the matriculation of two graduate students (year 4-5), and the publication of 5 papers (year 3-5+).We also aim to help students build the skills neededto become competitive in the workforce by using novel methods to solve problems indata-driven agriculture and conservation.Students will also receiveexplicit training in data ethics to guide the application of machine learning and work towards ethical and equitable soil health assessments. Our methods will include formal classroom instruction, development of curriculum or innovative teaching methodologies, internships, and extension and outreach. Wewill evaluate educational activities and track metrics relevant to our objectives with IRB approval as needed. To assess performance, we will utilize a survey instrument developed by our research groupthat includes a set of questions assessing their interest and knowledge levels before and post-training, level of confidence in subject ability, and experience with course elements (e.g., group project). Specifically, the survey will address soil health tools and ethical concepts in data science, including soil health indicators, proxy measurements, database curation, machine learning, Native Hawaiian history, and data ethics. We will also seek student evaluations of education materials. For TPSS 333, we will ask students to evaluate our course through the Official University of Hawaii Course and Faculty Evaluation System (eCAFE), which is an entirely online evaluation system. For the internships, we will adapt and disseminate our current Google Form evaluation of our ongoing USDA REEU program. Finally, we will consider workforce placement of students involved in the project. We will consult two large potential local employers for their input on the quality of skills the students acquire as part of our advisory group (as part of ourManagement Plan). We will also work with our undergraduate program to track the placement of students enrolled in our program. We will apply these methods in Year 3-5 of the project.

Progress 09/15/24 to 09/14/25

Outputs
Target Audience:Partnership for Climate-Smart Commodities: As an integral part of the Hawai'i Partnership for Climate Smart Commodities, our data science team interacted with the following Producer Engagement Teams who represented various stakeholder and commodity groups on the islands: (1) Hawai'i Cattlemen's Council, (2) O'ahu Resource Conservation and Development Council, (3) O'ahu Agriculture and Conservation Association, (4) Office of Indigenous Knowledge and Innovation, (5) Forest Solutions, (6) The Kohala Center, and (7) Hawai'i Farms Union United. These organizations were the liaison groups representing collaborating growers and land managers. Our team participated actively for the following committees of the Partnership: Producer Impact, Standard Operating Procedures, Data, MMRV, and USDA Data Reporting. Starting in May 2023, our team joined multiple meetings per week. However, in mid-April 2025, our Partnership for Climate Smart Commodities grant was terminated by the federal government. Since its termination, we have continued to partner with O'ahu Resource Conservation and Development Council to rapidly access soil health in unexplored systems, such as lo'i kalo agriculture. Producers: During the last reporting period, our data science team also supported rapid soil health assessments at Kako'o 'Oiwi to trial our machine learning algorithm-driven assessment of soil health. We provided 79 samples during June and July 2025, which were presented to the 9 kalo farmers on August 20, 2025. Government: Our data science team also met with local and federal government agencies to provide an overview of our soil health tools. We held a meeting with a federal ARS unit on October 22, 2024 to share our tool functions and discuss future sampling in Hawai'i. We also met with local government officials on November 8, 2024 at the Hawai'i Agricultural Conference and served on a panel about new tools. Non-profits: Our data science team has directly engaged with The Nature Conservancy to support data science policy, needs, and management on June 23, 2025 and August 15, 2025. Agribusiness: We also continued to engage directly with commercial entities. The first is a soil testing start-up (name withheld for privacy concerns), which is interested in using our soil health scoring function applied to FTIR measurements. We met March 4, 2025, March 7, 2025, and April 11, 2025, to provide an update on the newly published tool. This year, we transitioned the Soil Health Webtool from the external development environment used previously to a dedicated server hosted by CTAHR. The college provided a secure, partitioned Red Hat Linux server with root access for our team. The webtool, database, and applications are now installed and functional, and the system is currently accessible internally through the UH VPN while final security and firewall steps are completed. Public access for producers and researchers is scheduled for the next project period. Researchers and academics: Our team presented findings at multiple conferences: (1) 5 abstracts on the Year 1 results at the 2024 Soil Science Society of America's CANVAS Conference, and our work was featured in a keynote talk by co-PI Susan Crow. (2) The project PI (Maaz) was an invited panelist for "CTAHR Introduces New Tools and Programs to Help Farmers" at the Hawai'i Ag2024 Conference on November 8. 2024. (3) PI Maaz also presented results from this project at the Western Nutrient Management Conference on March 5, 2025. (4) Graduate student Hannah Shor presented her Bayesian Soil Scoring for Effective Land Management in Hawai'i at college (CTAHR) Showcase and Research Symposium on April 4, 2025, which won best departmental graduate student competition. (5) PI Maaz co-presented the soil health rapid test with a community partner organization for a special seminar series called Kahua Pa'a (Firm Foundation) on October 30. Our team submitted two poster abstracts for data from Year 2 to present at the 2025 Soil Science Society of America's CANVAS Conference. Finally, we engaged in a multi-institutional collaboration, with one important outcome being a data sharing policy agreement (outlined on April 29 and May 28, 2025). Changes/Problems:The work continues to be surprisingly smooth and rewarding. What opportunities for training and professional development has the project provided?During this last reporting period, we trained 1 post-doctoral researcher, 1 post-graduate, 3 PhD students, 3 undergraduates, and 1 high school student. This grant and program is supporting the professional development of postdocs and graduate students, and supported the following: Publication: Beckstrom, T. B., Bunnell, A., Maaz, T. M., Kantar, M. B., Deenik, J. L., Tallamy Glazer, C., ... & Crow, S. E. 2025. Mid-infrared spectroscopy and machine learning improve accessibility of Hawai'i soil health assessment. Soil Science Society of America Journal, 89(3), e70081. Conference presentations: Tinkey, S. and T.M. Maaz. (2025) Soil survey report of lo'i kalo systems. Kako'o 'Oiwi. Kaneohe, HI. Kepa, K., T. Maaz, Z. Hastings Silao, L. Bremer, K. Keys, M. Wong. (2025) Measuring the Relationship of Total Organic Carbon and Distance to Trees. Summer Research Institute Research Symposium. College of Tropical Agriculture and Human Resilience. University of Hawai'i at M?noa, Monolulu, HI. Choi, J.B., T.M. Maaz, C. Fullmer, T. Beckstorm , A. Bunnell, J.L. Deenik, S. Crow. (2025) Re-evaluation and Update of FTIR Soil Health Score Prediction Model with Increased Data Volume. University of Hawai'i Undergraduate Showcase. Honolulu, HI. Shor, H, T. Beckstrom, J. Deenik, S. Crow, T. Maaz. (2025) Bayesian Soil Scoring for Effective Land Management in Hawai'i: Incorporating Mineralogy, Climate, and Land Use History for Benchmark Scoring. CTAHR Showcase and Research Symposium, Honolulu, HI. Graduate student winner Slanzon, G., McClellan Maaz, T., Crow, S. E., Deenik, J. L., Kantar, M., & Nguyen, N. (2024) Connecting Fungal-Bacterial Interactions and Soil Health in Tropical Soils [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/158436 Shor, H., Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Unearthing Trends: Optimizing Tropical Soil Health Scoring Methods [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/160102 Estrada, K., McClellan Maaz, T., Beckstrom, T. B., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Tallamy Glazer, C. J., Satdichanh, M., Lawrence, D., Rivera Zayas, J., Ticktin, T., Sotomayor, D., & Crow, S. E. (2024) The Dirt on Disturbance: Creating an Anthropogenic Disturbance Index to Objectively Capture Overlap across and within Land Use Types for Tropical and Subtropical Agroecosystems [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159553 Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Assessing (sub)Tropical Soil Health Along a Disturbance Gradient in Hawai'i, Puerto Rico, and Pohnpei [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159558 Kiehl, K. D., Fullmer, C., Nguyen, N., McClellan Maaz, T., Crow, S. E., & Deenik, J. L. (2024) A Microbial Proxy for Mineralizable Nitrogen [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/162330 Submitted: Slanzon, G., Fullmer, C., Deenik, J. L., Nguyen, N., & McClellan Maaz, T. (2025) Soil Microbiome Health Index: A Scoring Approach to Predict Soil Health Based on Bacterial Composition of Cropland Soils in Hawaii [Abstract]. CANVAS 2025, Salt Lake City, UT. https://scisoc.confex.com/scisoc/2025am/meetingapp.cgi/Paper/167042 Fullmer, C., Slanzon, G., Crow, S. E., Deenik, J. L., Nguyen, N., & Maaz, T. M. (2025) Predicting Hawaiian Soil-Health Scores from Microbial Communities with Random-Forest Models [Abstract]. CANVAS 2025, Salt Lake City, UT. https://scisoc.confex.com/scisoc/2025am/meetingapp.cgi/Paper/169231 How have the results been disseminated to communities of interest?Partnership for Climate-Smart Commodities: We have disseminated results via a seminar series. Producers: We have disseminated soil health results to 9 lo'i kalo farmers. Government: We have disseminated results via one oral presentation/panel. Non-profits: We have disseminated results via one oral presentation. Agribusiness: We have disseminated results to agribusiness via one oral presentation/panel. Researchers and academics: We have disseminated results at four conferences/meetings via poster and oral presentations. We have one peer-reviewed publication. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: To Harmonize Soil Health Data Streams - This objective is largely completed. However, we aim to continue data processing for soil samples entering our three data streams. We have deployed a Shiny App for soil health scoring and we will make it publicly available in the next period. Objective 2: To predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments - As our machine learning efforts progress, we aim to publish our microbial models. We will also investigate machine learning models that combine MIR and microbial features for soil health assessments. Objective 3: To Build Pathways that Integrate Data Science into the Classroom and Beyond - We will continue training postdocs, graduate students, and undergraduates. Last period, he hosted four internship experiences. Next period, we will also collaborate with professors across campus and across institutions to develop case studies for the data ethics lesson, and we will host more interns.

Impacts
What was accomplished under these goals? Objective 1: Our team was able to successfully harmonize all soil health data streams during this first reporting period, which currently exist in the correct data structures and locations. We now have three databases as described in the first three activities of Objective 1. During this last reporting period, we constructed a relational database (final activity of Objective 1), as described below. Objective 1, Activity 1: In summary, we have more than tripled our soil health database since this grant was submitted, and now have 2,036 samples with soil health indicator data and scores. During this reporting period, we also tested four iterations of the soil health scoring function, which we found to be highly reliable and stable. We also created a new function to score a new class of soils: Histosols. Objective 1, Activity 2: During the first reporting period, we completed construction of the microbiome pipeline and generated quality-controlled datasets for fungal ITS and bacterial/archaeal 16S amplicons. During this reporting period, we have now expanded to 1,291 fungal ITS samples and 1,258 bacterial and archaeal 16S samples. Following quality control, we produced OTU feature tables including 782 fungal samples and 831 bacterial/archaeal samples. Objective 1, Activity 3: During the first reporting period, we finalized the FTIR pipeline and created a machine learning algorithm to relate spectral data to soil health indicator values and scores. During the last reporting period, we published our FTIR scoring function, and we have demonstrated its utility for assessing on-farm soil health. Together with Activity 2, we have fully implemented and documented microbiome and FTIR pipelines for analyzing soil samples. These workflows are accessible and executable on High-Performance Computing (HPC) clusters using the portable GitHub structure. Objective 1, Activity 4: During this last reporting period, we built a relational database to organize, store, and manage this large and complex soil health dataset including microbiome, FTIR spectral data, soil indicator data, and scores. Furthermore, we refined the controlled vocabulary used for collecting the metadata associated with the soil health tool, and aligned this vocabulary. We continue to follow best practices in reproducible workflow development, maintaining a well-organized shared repository with clear documentation, standardized file structures, and version-controlled scripts. During the last reporting period, we completed development of the soil health scoring Shiny app. It is now deployed on the college server and fully functional, with access available through the VPN while final security protocols are implemented. Once publicly released, it will provide scientists and growers with a real-time tool for evaluating soil health. Objective 2: During this last reporting period, we published and released our machine learning algorithm (Gaussian Processing) based on MIR spectroscopy (Beckstrom, T. B., Bunnell, A., Maaz, T. M., Kantar, M. B., Deenik, J. L., Tallamy Glazer, C., ... & Crow, S. E. 2025. Mid?infrared spectroscopy and machine learning improve accessibility of Hawaii soil health assessment. Soil Science Society of America Journal, 89(3), e70081.) This work was highlighted in the keynote address of the 2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX, by co-PI Susan Crow, S. E. (Rebuilding Health, Resilience, and Equity in Hawai'i's Agroecosystems, https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/156048). Through this work, we supported two PhD students and one undergraduate student intern, who developed and then tested and finetuned the model. We now use the model for farmers to rapidly assess soil health. During this reporting period, we advanced our objective of developing machine-learning approaches for predicting soil health scores from high-dimensional microbiome data. We evaluated multiple modeling strategies using 16S and ITS rRNA amplicon sequence data collected across Hawaiian agroecosystems and forests. Initial random-forest regression models trained on the 100 most abundant OTUs from 636 fungal and 741 bacterial/archeal samples achieved an RMSE of 0.14 (fungi) and 0.13 (bacteria), representing over a 50% reduction in prediction error relative to a null model. Variable-importance analysis identified several bacterial and fungal taxa contributing strongly to prediction accuracy, demonstrating clear biological interpretability and providing insights into microbial drivers of soil health. In parallel, we developed the Soil Microbiome Health Index, a biologically interpretable scoring formula based on 64 bacterial genera associated with either healthy (n=43) or degraded (n=21) soils. The index was trained on 196 samples and validated on an independent set of 61 samples collected from 27 farms across four islands. The model showed a strong positive correlation with laboratory-derived soil health scores (R = 0.79, p < 0.001), greatly outperforming conventional diversity metrics (e.g., Shannon index, R = 0.04). The index was also strongly correlated with CO? burst (R = 0.78, p < 0.001), further supporting its utility as a rapid biological indicator. Collectively, these efforts demonstrate significant progress toward using microbial community data to generate rapid, affordable, and interpretable soil-health predictions. Ongoing work, including expansion of training datasets, application of compositional data transformations, and evaluation of deep-learning architectures, will further improve model performance and support deployment of microbiome-informed soil health assessment tools tailored to Hawai'i's agricultural landscapes. Objective 3:Last period, we implemented a data policy and technical summary (adopted on April 25, 2024). As part of the adopted protocol for accessing the data, we implemented a data request form that informs data users of the policies and practices which are requisite for using the data. This enables us to balance the ability to share data with the priority and imperative to protect it. During the last reporting period, we trained 1 post-doctoral researcher, 1 post-graduate, 3 PhD students,3 undergraduates, and 1 undergraduate student. One of these undergraduate students went on to perform undergraduate research for us to refine our MIR machine learning model. Students and postdocs working on this project have been trained to contribute to the repository, making them proficient in version control, reproducibility, and open science practices. This repository serves as a training ground for developing place-appropriate data science practices. In Fall 2024, co-PD Kantar and PD Maaz taught TPSS 333. The purpose of this course is to prepare students to learn and implement principles of data analysis, visualization, and mapping. Our targeted audience is students (in agriculture, environmental science, geography, sustainability studies and beyond) interested in developing a critical framework for understanding different ways of measuring, visualizing, and interpreting the different goals humans have for agroecosystems. This course focused on the recent advances in analytical techniques which have allowed an unprecedented way of exploring interdependence of our natural and human-mediated systems. As part of the course, we piloted an ethics training which utilizes a case studies approach. We also have organized a meeting to take place in January 2026 with 40 invited experts to develop curriculum and instructional tools to learn data ethics in the classroom. Utilizing our soil health database, we also launched a physical soil library space for students and researchers to access soil archetypes which include unique combinations of soil mineralogy, current land management, and past land use. This library is currently supporting graduate research efforts and high school learning opportunities.

Publications

  • Type: Peer Reviewed Journal Articles Status: Published Year Published: 2025 Citation: Beckstrom, Tanner B., Arianna Bunnell, Tai M. Maaz, Michael B. Kantar, Jonathan L. Deenik, Christine Tallamy Glazer, Peter Sadowski, and Susan E. Crow. "Mid-infrared spectroscopy and machine learning improve accessibility of Hawaii soil health assessment." Soil Science Society of America Journal 89, no. 3 (2025): e70081.


Progress 09/15/23 to 09/14/24

Outputs
Target Audience:Partnership for Climate-Smart Commodities: As an integral part of the Hawai'i Partnership for Climate Smart Commodities, our data science team interacts with the following Producer Engagement Teams who represent various stakeholder and commodity groups on the islands: (1) Hawai'i Cattlemen's Council, (2) O'ahu Resource Conservation and Development Council, (3) O'ahu Agriculture and Conservation Association, (4) Office of Indigenous Knowledge and Innovation, (5) Forest Solutions, (6) The Kohala Center, and (7) Hawai'i Farms Union United. These organizations are the liaison groups representing collaborating growers and land managers. Our team is active participants for the following committees of the Partnership: Producer Impact, Standard Operating Procedures, Data, MMRV, and USDA Data Reporting. Starting in May 2023, our team joins multiple meetings per week. Producers: During the last reporting period, our data science team also supported the USDA funded soil health farmer cohort program led by O'ahu Resource and Conservation Development. We provided 7 one-on-one sessions to participating growers on interpreting their soil health testing reports from Jan 24 to 31, 2024. We also engaged with a local farm (name withheld for privacy concerns) on a soil fertility and soil health educational project to develop management plans using project data, which were presented to the farmer on May 6, 2024. Government: Our data science team also met with federal government agencies to provide an overview of our soil health tools. This includes meetings with the USDA NRCS local, regional, and national offices and ARS. Two meetings took place during this project reporting period on Sept 14, 2023, and June 20, 2024. We also reached out to an ARS unit focused on soil health measurements during this reporting period, and we scheduled a meeting for October 22, 2024. Non-profits: Our data science team has directly engaged and partnered with two community-based organizations (names withheld for privacy concerns). We held meetings directly focused on data science policy, needs, and management on Feb 16, 2023, Oct 13, 2023, and Oct 29, 2024. Agribusiness: We have also engaged directly with commercial entities. The first is a soil testing start-up (name withheld for privacy concerns), which is interested in using our soil health scoring function applied to FTIR measurements. We have met with the university about the commercialization potential of our tool throughout the reporting period. We are also in the process of transitioning to host our soil health webtool with a local secure, multiplatform, location-based software company. This platform is popular with producers, and it will allow us to interact, accept soil samples, and report results with producers. Researchers and academics: Our team interacted and presented our first-year data science findings at the 2023 Soil Science of America Conference (one oral and one poster presentation) and 11th BIOGEOMON International Symposium on Ecosystem Behavior (one poster presentation). We also submitted 5 abstracts to the 2024 Soil Science of America Conference, and our work was featured in a keynote talk by co-PI Susan Crow. The project PI (Maaz) also attended the NIFA Project Directors Meeting in Manhattan, KS, and was invited to present the soil health data science tools as part of the Climate-Smart Partnership Seminar Series and as a panelist for "CTAHR Introduces New Tools and Programs to Help Farmers" at the Hawai'i Ag2024 Conference. Changes/Problems:No major changes to the approach or challenges. The work has been surprisingly smooth and rewarding. However, one upcoming concern is the timing of the data science internship component of this grant. In working with the Information and Computer Sciences department, hosting interns throughout the year may be more desirable than during the summer. However, we requested summer internships. And so, we might need to request formal permission to host undergraduate interns throughout the year. What opportunities for training and professional development has the project provided?Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. As part of this, we recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Since January 2024, our data science group institute a weekly writing workshop to work on manuscripts, proposals, and theses. The paper that we submitted as part of this grant came from this workshop. In total, students of the workshop have generated two manuscripts and one thesis. We now have four papers and two proposals in preparation. As mentioned in the accomplishments section, we launched TPSS 333, which is a course designed for natural science students to gain data science skills. We modeled this course off of our previous data science internship program (NIFA workforce development grant), for which we hosted our final cohort during this reporting period. This grant and program is supporting the professional development of postdocs and graduate students, and supported the following conference submissions to the SSSA meeting after only one year of the project: Slanzon, G., McClellan Maaz, T., Crow, S. E., Deenik, J. L., Kantar, M., & Nguyen, N. (2024) Connecting Fungal-Bacterial Interactions and Soil Health in Tropical Soils [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/158436 Shor, H., Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Unearthing Trends: Optimizing Tropical Soil Health Scoring Methods [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/160102 Estrada, K., McClellan Maaz, T., Beckstrom, T. B., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Tallamy Glazer, C. J., Satdichanh, M., Lawrence, D., Rivera Zayas, J., Ticktin, T., Sotomayor, D., & Crow, S. E. (2024) The Dirt on Disturbance: Creating an Anthropogenic Disturbance Index to Objectively Capture Overlap across and within Land Use Types for Tropical and Subtropical Agroecosystems [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159553 Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Assessing (sub)Tropical Soil Health Along a Disturbance Gradient in Hawai'i, Puerto Rico, and Pohnpei [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159558 Kiehl, K. D., Fullmer, C., Nguyen, N., McClellan Maaz, T., Crow, S. E., & Deenik, J. L. (2024) A Microbial Proxy for Mineralizable Nitrogen [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/162330 How have the results been disseminated to communities of interest?Below are ways in which data are disseminated to the following audiences. Partnership for Climate-Smart Commodities: We have disseminated results via a seminar series and various biweekly meetings. Producers: We have disseminated results to producers via seven one-on-one training and by generating dozens and dozens of soil health reports to provide soil health scores.We are actively working with the new software company to generate reports via their reporting platform. Government: We have disseminated results via three oral presentations and three 2-pager briefings. Non-profits: We have disseminated results via two oral presentations with accompanying printed handouts. Agribusiness: We have not disseminated results to agribusiness. Researchers and academics: We have disseminated results at two conferences and one meeting via poster and oral presentations. We have submitted one publication for peer-review. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: To Harmonize Soil Health Data Streams - This objective is largely completed. However, we aim to continue data processing for soil samples entering our three data streams. We will have a public release of Shiny App for soil health scoring during the next reporting period. Objective 2: To predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments - This objective's activities will be the focus of our next reporting period. As our machine learning efforts progress, we will add additional models and pipelines to our repository, including deep learning scripts for soil health prediction. We aim to make sections of the repository public by the next reporting period to share reproducible pipelines with the wider scientific community and stakeholders. Objective 3: To Build Pathways that Integrate Data Science into the Classroom and Beyond - We will continue training postdocs, graduate students, and undergraduates. We will launch our first undergraduate internship experience. In Spring 2025, we will also collaborate with professors across campus (one in sustainability, one in hydrology, one in oceanography, one in plant science, and one in natural resource management) to develop case studies for the data ethics lesson.

Impacts
What was accomplished under these goals? 1. To Harmonize Soil Health Data Streams - Engineer existing data streams to facilitate artificial intelligence and deep learning. Our team was able to successfully harmonize all soil health data streams during this first reporting period, which currently exist in the correct data structures and locations. We now have three databases as described in the first three activities of Objective 1, and are actively in the process of constructing a relational database (final activity of Objective 1), as described below. Objective 1, Activity 1: In summary, we have tripled our soil health database since this grant was submitted, and now have over 1500 samples with soil health indicator data and scores. During this reporting period, we also tested three iterations of the soil health scoring function, which we found to be highly reliable and stable. We presented the results of the second iteration at a conference (McClellan Maaz, T., Crow, S. E., Deenik, J. L., Loo, K., Tallamy-Glazer, C. J., & Beckstrom, T. B. 2023. Measuring the Immeasurable: A Structural Equation Modeling Approach to Conceptualizing and Scaling up Soil Health Assessments [Abstract]. ASA, CSSA, SSSA International Annual Meeting, St. Louis, MO.). We are developing a manuscript outlining the results of the third iteration. Objective 1, Activity 2: We also finished constructing the microbiome pipeline and processed data for fungal ITS (500 samples) and bacterial and archaeal 16S (703 samples). Objective 1, Activity 3: We finalized the FTIR pipeline and created a machine learning algorithm to relate spectral data to soil health indicator values and scores. Together with Activity 2, we fully implemented and documented microbiome and FTIR pipelines for analyzing soil samples. These workflows are accessible and executable on High-Performance Computing (HPC) clusters using the portable GitHub structure. Objective 1, Activity 4: During this process, we created scripts to join microbiome and FTIR data streams using unique soil sample identifiers. Harmonized data now integrates multiple sources, preparing it for machine learning model training. To accomplish this task, we performed extensive data cleaning to ensure that the soil ID was accurate and accessible. We are currently working on building a relational database to organize, store, and manage this large and complex soil health dataset including microbiome, FTIR spectral data, soil indicator data, and scores. Furthermore, we are in the process of refining the controlled vocabulary used for collecting the metadata associated with the soil health tool, and aligning this vocabulary as we transition to hosting the tool on the new software platform. During this last reporting period, we have spent a lot of time establishing a collaborative remote repository for sharing and maintaining reproducible workflows for bioinformatics, soil health scoring, and machine learning model development. We organized the repository structure with clear folder hierarchies for each data stream (microbiome, FTIR, soil health indicators). All scripts utilize relative file paths to ensure portability and flexibility for different users and computational environments. README.md files in each project directory ensure that users can easily follow setup instructions, with dependencies and versions documented using requirements.txt and environment.yml files for Python and Conda environments, respectively. Markdown (.md) files and Jupyter notebooks document each step of the data processing and model development pipelines, including visualization, error reporting, and parameter tuning. The repository tracks all changes to scripts and workflows, ensuring that team members can easily collaborate, roll back changes, and experiment with new methods. Beginning in September 2024, we began building a prototype of the Shiny app for soil health scoring. It is currently under development andis maintained within the GitHub repository. This app will serve as a tool for fellow scientists and growers to evaluate soil health in real-time. 2. To Develop Deep Learning Models - Predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments. During this last reporting period, we have begun exploring this objective using our FTIR dataset. We submitted our first publication for the grant that reports the development of a machine learning algorithm (Gaussian Processing) which is currently under review for (Tanner B. Beckstrom, Arianna Bunnell, Tai M. Maaz, Michael B. Kantar, Jonathan L. Deenik, Christine Tallamy Glazer, Peter Sadowski, Susan E. Crow. In review. Mid-infrared Spectroscopy and Machine Learning Improve Accessibility of Hawai'i Soil Health Assessment. SSSAJ.) This work will be highlighted in the keynote address of the 2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX, by co-PI Susan Crow, S. E. (Rebuilding Health, Resilience, and Equity in Hawai'i's Agroecosystems, https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/156048). We have recruited and hired one PhD student to work on this objective, and we have built a relationship with faculty in the Department of Information and Computer Sciences. Given that this objective is contingent upon the previous objective, we will conduct most of this work in upcoming cycles. 3. To Build Pathways that Integrate Data Science into the Classroom and Beyond - Train undergraduate and graduate students in data science and data ethics through classroom instruction, internship opportunities, and assistantships. We initially focused a lot of time to adopt a data policy and technical summary of our soil health database before recruiting students and developing ethics training. We adopted the data policy and technical summary on April 25, 2024. Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. We recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Students and postdocs working on this project have been trained to contribute to the repository, making them proficient in version control, reproducibility, and open science practices. This repository serves as a training ground for developing place-appropriate data science practices. In Fall 2024, we also launched TPSS 333. The purpose of this course is to prepare students to learn and implement principles of data analysis, visualization, and mapping. Our targeted audience is students (in agriculture, environmental science, geography, sustainability studies and beyond) interested in developing a critical framework for understanding different ways of measuring, visualizing, and interpreting the different goals humans have for agroecosystems. This course is focused on the recent advances in analytical techniques which have allowed anunprecedented way of exploring interdependence of our natural and human-mediated systems. As part of the course, we piloted an ethics training which utilizes a case studies approach first featuring research on kalo (taro) and indigenous data sovereignty. Utilizing our soil health database, we have also created a physical soil library space for students and researchers to access soil archetypes which include unique combinations of soil mineralogy, current land management, and past land use. This library will support future research efforts.

Publications