Progress 09/15/23 to 09/14/24
Outputs Target Audience:Partnership for Climate-Smart Commodities: As an integral part of the Hawai'i Partnership for Climate Smart Commodities, our data science team interacts with the following Producer Engagement Teams who represent various stakeholder and commodity groups on the islands: (1) Hawai'i Cattlemen's Council, (2) O'ahu Resource Conservation and Development Council, (3) O'ahu Agriculture and Conservation Association, (4) Office of Indigenous Knowledge and Innovation, (5) Forest Solutions, (6) The Kohala Center, and (7) Hawai'i Farms Union United. These organizations are the liaison groups representing collaborating growers and land managers. Our team is active participants for the following committees of the Partnership: Producer Impact, Standard Operating Procedures, Data, MMRV, and USDA Data Reporting. Starting in May 2023, our team joins multiple meetings per week. Producers: During the last reporting period, our data science team also supported the USDA funded soil health farmer cohort program led by O'ahu Resource and Conservation Development. We provided 7 one-on-one sessions to participating growers on interpreting their soil health testing reports from Jan 24 to 31, 2024. We also engaged with a local farm (name withheld for privacy concerns) on a soil fertility and soil health educational project to develop management plans using project data, which were presented to the farmer on May 6, 2024. Government: Our data science team also met with federal government agencies to provide an overview of our soil health tools. This includes meetings with the USDA NRCS local, regional, and national offices and ARS. Two meetings took place during this project reporting period on Sept 14, 2023, and June 20, 2024. We also reached out to an ARS unit focused on soil health measurements during this reporting period, and we scheduled a meeting for October 22, 2024. Non-profits: Our data science team has directly engaged and partnered with two community-based organizations (names withheld for privacy concerns). We held meetings directly focused on data science policy, needs, and management on Feb 16, 2023, Oct 13, 2023, and Oct 29, 2024. Agribusiness: We have also engaged directly with commercial entities. The first is a soil testing start-up (name withheld for privacy concerns), which is interested in using our soil health scoring function applied to FTIR measurements. We have met with the university about the commercialization potential of our tool throughout the reporting period. We are also in the process of transitioning to host our soil health webtool with a local secure, multiplatform, location-based software company. This platform is popular with producers, and it will allow us to interact, accept soil samples, and report results with producers. Researchers and academics: Our team interacted and presented our first-year data science findings at the 2023 Soil Science of America Conference (one oral and one poster presentation) and 11th BIOGEOMON International Symposium on Ecosystem Behavior (one poster presentation). We also submitted 5 abstracts to the 2024 Soil Science of America Conference, and our work was featured in a keynote talk by co-PI Susan Crow. The project PI (Maaz) also attended the NIFA Project Directors Meeting in Manhattan, KS, and was invited to present the soil health data science tools as part of the Climate-Smart Partnership Seminar Series and as a panelist for "CTAHR Introduces New Tools and Programs to Help Farmers" at the Hawai'i Ag2024 Conference. Changes/Problems:No major changes to the approach or challenges. The work has been surprisingly smooth and rewarding. However, one upcoming concern is the timing of the data science internship component of this grant. In working with the Information and Computer Sciences department, hosting interns throughout the year may be more desirable than during the summer. However, we requested summer internships. And so, we might need to request formal permission to host undergraduate interns throughout the year. What opportunities for training and professional development has the project provided?Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. As part of this, we recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Since January 2024, our data science group institute a weekly writing workshop to work on manuscripts, proposals, and theses. The paper that we submitted as part of this grant came from this workshop. In total, students of the workshop have generated two manuscripts and one thesis. We now have four papers and two proposals in preparation. As mentioned in the accomplishments section, we launched TPSS 333, which is a course designed for natural science students to gain data science skills. We modeled this course off of our previous data science internship program (NIFA workforce development grant), for which we hosted our final cohort during this reporting period. This grant and program is supporting the professional development of postdocs and graduate students, and supported the following conference submissions to the SSSA meeting after only one year of the project: Slanzon, G., McClellan Maaz, T., Crow, S. E., Deenik, J. L., Kantar, M., & Nguyen, N. (2024) Connecting Fungal-Bacterial Interactions and Soil Health in Tropical Soils [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/158436 Shor, H., Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Unearthing Trends: Optimizing Tropical Soil Health Scoring Methods [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/160102 Estrada, K., McClellan Maaz, T., Beckstrom, T. B., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Tallamy Glazer, C. J., Satdichanh, M., Lawrence, D., Rivera Zayas, J., Ticktin, T., Sotomayor, D., & Crow, S. E. (2024) The Dirt on Disturbance: Creating an Anthropogenic Disturbance Index to Objectively Capture Overlap across and within Land Use Types for Tropical and Subtropical Agroecosystems [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159553 Beckstrom, T. B., Estrada, K., McClellan Maaz, T., Deenik, J. L., Reyes, N., Mix, S., Loo, M. K., Sotomayor, D., Lawrence, D., Tallamy Glazer, C. J., & Crow, S. E. (2024) Assessing (sub)Tropical Soil Health Along a Disturbance Gradient in Hawai'i, Puerto Rico, and Pohnpei [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/159558 Kiehl, K. D., Fullmer, C., Nguyen, N., McClellan Maaz, T., Crow, S. E., & Deenik, J. L. (2024) A Microbial Proxy for Mineralizable Nitrogen [Abstract]. ASA, CSSA, SSSA International Annual Meeting, San Antonio, TX. https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/162330 How have the results been disseminated to communities of interest?Below are ways in which data are disseminated to the following audiences. Partnership for Climate-Smart Commodities: We have disseminated results via a seminar series and various biweekly meetings. Producers: We have disseminated results to producers via seven one-on-one training and by generating dozens and dozens of soil health reports to provide soil health scores.We are actively working with the new software company to generate reports via their reporting platform. Government: We have disseminated results via three oral presentations and three 2-pager briefings. Non-profits: We have disseminated results via two oral presentations with accompanying printed handouts. Agribusiness: We have not disseminated results to agribusiness. Researchers and academics: We have disseminated results at two conferences and one meeting via poster and oral presentations. We have submitted one publication for peer-review. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: To Harmonize Soil Health Data Streams - This objective is largely completed. However, we aim to continue data processing for soil samples entering our three data streams. We will have a public release of Shiny App for soil health scoring during the next reporting period. Objective 2: To predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments - This objective's activities will be the focus of our next reporting period. As our machine learning efforts progress, we will add additional models and pipelines to our repository, including deep learning scripts for soil health prediction. We aim to make sections of the repository public by the next reporting period to share reproducible pipelines with the wider scientific community and stakeholders. Objective 3: To Build Pathways that Integrate Data Science into the Classroom and Beyond - We will continue training postdocs, graduate students, and undergraduates. We will launch our first undergraduate internship experience. In Spring 2025, we will also collaborate with professors across campus (one in sustainability, one in hydrology, one in oceanography, one in plant science, and one in natural resource management) to develop case studies for the data ethics lesson.
Impacts What was accomplished under these goals?
1. To Harmonize Soil Health Data Streams - Engineer existing data streams to facilitate artificial intelligence and deep learning. Our team was able to successfully harmonize all soil health data streams during this first reporting period, which currently exist in the correct data structures and locations. We now have three databases as described in the first three activities of Objective 1, and are actively in the process of constructing a relational database (final activity of Objective 1), as described below. Objective 1, Activity 1: In summary, we have tripled our soil health database since this grant was submitted, and now have over 1500 samples with soil health indicator data and scores. During this reporting period, we also tested three iterations of the soil health scoring function, which we found to be highly reliable and stable. We presented the results of the second iteration at a conference (McClellan Maaz, T., Crow, S. E., Deenik, J. L., Loo, K., Tallamy-Glazer, C. J., & Beckstrom, T. B. 2023. Measuring the Immeasurable: A Structural Equation Modeling Approach to Conceptualizing and Scaling up Soil Health Assessments [Abstract]. ASA, CSSA, SSSA International Annual Meeting, St. Louis, MO.). We are developing a manuscript outlining the results of the third iteration. Objective 1, Activity 2: We also finished constructing the microbiome pipeline and processed data for fungal ITS (500 samples) and bacterial and archaeal 16S (703 samples). Objective 1, Activity 3: We finalized the FTIR pipeline and created a machine learning algorithm to relate spectral data to soil health indicator values and scores. Together with Activity 2, we fully implemented and documented microbiome and FTIR pipelines for analyzing soil samples. These workflows are accessible and executable on High-Performance Computing (HPC) clusters using the portable GitHub structure. Objective 1, Activity 4: During this process, we created scripts to join microbiome and FTIR data streams using unique soil sample identifiers. Harmonized data now integrates multiple sources, preparing it for machine learning model training. To accomplish this task, we performed extensive data cleaning to ensure that the soil ID was accurate and accessible. We are currently working on building a relational database to organize, store, and manage this large and complex soil health dataset including microbiome, FTIR spectral data, soil indicator data, and scores. Furthermore, we are in the process of refining the controlled vocabulary used for collecting the metadata associated with the soil health tool, and aligning this vocabulary as we transition to hosting the tool on the new software platform. During this last reporting period, we have spent a lot of time establishing a collaborative remote repository for sharing and maintaining reproducible workflows for bioinformatics, soil health scoring, and machine learning model development. We organized the repository structure with clear folder hierarchies for each data stream (microbiome, FTIR, soil health indicators). All scripts utilize relative file paths to ensure portability and flexibility for different users and computational environments. README.md files in each project directory ensure that users can easily follow setup instructions, with dependencies and versions documented using requirements.txt and environment.yml files for Python and Conda environments, respectively. Markdown (.md) files and Jupyter notebooks document each step of the data processing and model development pipelines, including visualization, error reporting, and parameter tuning. The repository tracks all changes to scripts and workflows, ensuring that team members can easily collaborate, roll back changes, and experiment with new methods. Beginning in September 2024, we began building a prototype of the Shiny app for soil health scoring. It is currently under development andis maintained within the GitHub repository. This app will serve as a tool for fellow scientists and growers to evaluate soil health in real-time. 2. To Develop Deep Learning Models - Predict soil health scores using machine learning models that incorporate high-dimensional data for improved soil health assessments. During this last reporting period, we have begun exploring this objective using our FTIR dataset. We submitted our first publication for the grant that reports the development of a machine learning algorithm (Gaussian Processing) which is currently under review for (Tanner B. Beckstrom, Arianna Bunnell, Tai M. Maaz, Michael B. Kantar, Jonathan L. Deenik, Christine Tallamy Glazer, Peter Sadowski, Susan E. Crow. In review. Mid-infrared Spectroscopy and Machine Learning Improve Accessibility of Hawai'i Soil Health Assessment. SSSAJ.) This work will be highlighted in the keynote address of the 2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX, by co-PI Susan Crow, S. E. (Rebuilding Health, Resilience, and Equity in Hawai'i's Agroecosystems, https://scisoc.confex.com/scisoc/2024am/meetingapp.cgi/Paper/156048). We have recruited and hired one PhD student to work on this objective, and we have built a relationship with faculty in the Department of Information and Computer Sciences. Given that this objective is contingent upon the previous objective, we will conduct most of this work in upcoming cycles. 3. To Build Pathways that Integrate Data Science into the Classroom and Beyond - Train undergraduate and graduate students in data science and data ethics through classroom instruction, internship opportunities, and assistantships. We initially focused a lot of time to adopt a data policy and technical summary of our soil health database before recruiting students and developing ethics training. We adopted the data policy and technical summary on April 25, 2024. Currently, we are training 1 post-doctoral researcher, 1 post-graduate, 1 PhD student, and 3 undergraduates. We recruited one of these undergraduate students from our previous internship program, and he is developing the Shiny app for soil health scoring. Students and postdocs working on this project have been trained to contribute to the repository, making them proficient in version control, reproducibility, and open science practices. This repository serves as a training ground for developing place-appropriate data science practices. In Fall 2024, we also launched TPSS 333. The purpose of this course is to prepare students to learn and implement principles of data analysis, visualization, and mapping. Our targeted audience is students (in agriculture, environmental science, geography, sustainability studies and beyond) interested in developing a critical framework for understanding different ways of measuring, visualizing, and interpreting the different goals humans have for agroecosystems. This course is focused on the recent advances in analytical techniques which have allowed anunprecedented way of exploring interdependence of our natural and human-mediated systems. As part of the course, we piloted an ethics training which utilizes a case studies approach first featuring research on kalo (taro) and indigenous data sovereignty. Utilizing our soil health database, we have also created a physical soil library space for students and researchers to access soil archetypes which include unique combinations of soil mineralogy, current land management, and past land use. This library will support future research efforts.
Publications
|