Progress 09/15/24 to 09/14/25
Outputs Target Audience:The primary target audience(s) during this reporting period include: (1) Conferences IEEE International Conference on Artificial Intelligence (IEEE CAI-2025),ACM SIGSPATIAL, International Conference on Advances in Geographic Information Systems (SIGSPATIAL-2024, SIGSPATIAL-2025), IEEE eScience Conference (e-Science-2025),IEEE International Conference on on Data Science and Advanced Analytics (DSAA-25), IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT-2024), International Conference on Artificial Intelligence (AAAI-2025), IEEE International Conference on Big Data (BigData2024) (2) Journals MDPI, Remote Sensing,Hydrology,Journal of Hydrology (3) Interdisciplinary collaboration This project is based on close collaboration across agricultural science, civil and environmental engineering, and computer science. Our PIs have held a project-wide meeting every month during the semester, in addition to ad hoc meetings to troubleshoot challenges. We have had more than 30additional ad hoc meetings among different subsets of the PIs and their research group members. During the reporting period, the PI and her graduate students held two regular meetings and one to two group meetings each week. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?During the reporting period, we provided the following training and professional development opportunities. 1. Ph.D. students Paahuni Khandelwal has been trained in AI-based modeling with multi-modal datasets to estimate current root zone soil moisture levels. She graduated in Spring 2025. Tanjim Faruk has been trained in vision modeling with in-situ observations combined with the scientific model. Rupasree Dey has been trained in multi-modal deep learning modeling with time series dataset as part of this project. Abdul Matin has been trained in knowledge distillation and domain adaptation within the context of soil moisture estimation particularly focusing on extending foundation model. Andrei Bachinin has been working on using spectral images to understand soil properties and soil moisture content estimation. 2. M.S. students. Cavin Alderfer and Samuel Swing were trained in the use of APSIM for simulating crop growth. Cavin finished his studies by completing two graduate certificates. A professional development plan was developed for/with Samuel Swing, and we are actively working towards accomplishing the goals laid out in his plan. Matthew Young and Freddy Rarrieu were trained in developing a cyberinfrastructure for data visualization. Both students completed their MS degree in Spring 2025. Everett Lewark has been trained in developing framework for the multi-resolution spatial datasets as part of the HERMIS framework. How have the results been disseminated to communities of interest?A. Journal articles and papers published in proceedings of conferences [1] Paahuni Khandelwal, Jeffery D. Niemann, David J. Mulla, Shrideep Pallickara, Sangmi Lee Pallickara, SUBTERRA:Estimating Soil Moisture at Root Zone Depths Using Science-guided Learning, In Proceedings of the IEEE Conference onArtificial Intelligence (IEEE CAI), 2025. [2] Andrei Bachinin, Rupasree Dey, Paahuni Khandelwal, Sam Leuthold, M. Francesca Cotrufo, Shrideep Pallickara, andSangmi Lee Pallickara, Science-Informed Multitask Transformer for Soil Property Prediction from FTIR Spectroscopy, (Toappear) Proceedings of the IEEE eScience Conference, 2025 [3] Tanjim Faruk, Abdul Matin, Shrideep Pallickara, and Sangmi Lee Pallickara, Accounting for Spatial Variability with the Histogram of Oriented Gradients Based Masking Improves Performance of Masked Autoencoder over Hyperspectral Satellite Imagery, student poster and abstract, In proceedings of the AAAI Conference on Artificial Intelligence, 2025. [4] Hansen, Paige, Nathan Orwick, Kassidy Barram, Pierce Smith, Jay Breidt, Sangmi Lee Pallickara, and Shrideep Pallickara. "Archimedes: A Framework to Support Distributional Similarity Analysis over Arbitrary Spatiotemporal Scopes at Scale." In2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 215-225. IEEE, 2025. [5] Paahuni Khandelwal, Sangmi Lee Pallickara, Shrideep Pallickara, DeepSoil: A Science-guided Framework for GeneratingHigh Precision Soil Moisture Maps by Reconciling Measurement Profiles Across In-situ and Remote Sensing Data, In proceedings of the ACM SIGSPATIAL, International Conference on Advances in Geographic Information Systems, 2024.** Finalist for the Best Paper Award. [6] Federico Larrieu, Tyson O'Leary, Sangmi Lee Pallickara, and Shrideep Pallickara, MAGELLAN: Enabling Effective SearchOver Voluminous, High-Dimensional Scientific Datasets, In Proceedings of the IEEE/ACM International Conference on BigData Computing, Applications and Technologies (BDCAT), 2024. [7] Everett Lewark, Matthew Young, Paahuni Khandelwal, Sangmi Lee Pallickara, and Shrideep Pallickara, Periscope: A Framework for Visualizations of Multiresolution Spatiotemporal Data at Scale, In proceeding of the IEEE International Conference on Big Data (IEEE BigData), 2024. [8] Kassidy Barram, Sangmi Lee Pallickara, Shrideep Pallickara, SCRYBE: Enabling Programmatic Interfaces for ExplorationsOver Voluminous Spatiotemporal Data Collections, In proceedings of the IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), 2024 [9] Sahaar, A.S., and J.D. Niemann, 2024, "Estimating Rootzone Soil Moisture by Fusing Multiple Remote Sensing Productswith Machine Learning," Remote Sensing, 16, 3699, doi: 10.3390/rs16193699 [10] Bindner, J.R., H.E. Proulx, K. Wickham, J.D. Niemann, J. Scalia, T.R. Green, and P.J. Grazaitis, 2025, "Dependence ofSoil Moisture and Strength on Topography and Vegetation Varies within a SMAP Grid Cell," Hydrology, 12(34), 1-24, doi:10.3390/hydrology12020034. [11] Fischer, S., J.D. Niemann, J. Scalia, M.D. Bullock, H.E. Proulx, B. Kim, T.R. Green, and P.J. Grazaitis, 2025, "Assessingthe Influence of Model Inputs on Performance of the EMT+VS Soil Moisture Downscaling Model for a Large Foothills Regionin Northern Colorado," Journal of Hydrology, 650, 132397, doi: 10.1016/j.jhydrol.2024.132397. [12] Faruk, Tanjim Bin, Abdul Matin, Shrideep Pallickara, and Sangmi Lee Pallickara. "TerraMAE: Learning Spatial-Spectral Representations from Hyperspectral Earth Observation Data via Adaptive Masked Autoencoders."(To appear) In proceedings of the ACM SIGSPATIAL, International Conference on Advances in Geographic Information Systems, 2025 B. Dissertations and Thesis [1] Paahuni Khandelwal, Large-scale Predictive Modeling for Spatiotemporally Evolving Phenomena. Doctoral Dissertation, Computer Science, Colorado State University, 2025 [2] Frederic Larrieu, Enabling Effective Search Over Voluminous, High-dimensional Scientific Datasets, Masters Thesis, Computer Science, Colorado State University, 2025 [3] Kati Patterson, Time series Analysis over Sparse, Non-stationary Datasets with Variational Mode Decomposition and Transfer Learning, Masters Thesis, Computer Science, Colorado State University, 2025 [4] Kassidy Baram, Enabling Programmatic Interfaces for Voluminous Spatiotemporal Data Collections, Masters Thesis, Computer Science, Colorado State University, 2025 [5] Matthew Young, Rapid & Interactive Explorations of Voluminous Spatial Temporal Datasets, Masters Thesis, Computer Science, Colorado State University, 2025 [6] Tajim Faruk, Towards Generating A Pre-training Image Transformer Framework for Preserving Spatio-Spectral Properties in Hyperspectral Satellite Images, Masters Thesis, Computer Science, Colorado State University, 2025 C. Web-based Soil Digital Twin for Soil moisture This is an on-going effort. However, to seek feedback, current version with the soil property mapping capability is available at: https://soiltwin.org/soils-in-silico/ What do you plan to do during the next reporting period to accomplish the goals? Develop a Vision AI-based models for root-zone soil moisture forecasting Generate training data corpus from the process-based model Measuring and estimating uncertainty of the root-zone soil moisture models. Extending our Hermes framework to ingest model outputs with effective model inference workflow.
Impacts What was accomplished under these goals?
(1)Develop models for the root zone soil moisture estimation (Object 1, 2, and 3) During the reporting period, out modeling effort focused on developing a Deep Learning based model that estimates soil moisture within the root zone. Root zone soil moisture is a key driver for decision making for winter wheat growers. It indicates the actual water available where winter wheat will be seeded. If the root zone soil moisture level is not suitable for the winter wheat, growers may consider different crop rotation plans. We have developed SUBTERRA, a science-guided deep learning framework designed to estimate soil moisture (SM) within the top 20 cm of root zone. The key contribution of this activity is to overcome limitations of traditional data-driven models by integrating scientific models, particularly the Richards' equation, into machine learning models. We combined sparse, high-precision in-situ sensor data with ancillary datasets such as satellite observations (SMAP), meteorological data (GridMET), soil hydraulic properties (POLARIS), and land-cover information (NLCD), to construct a more robust and physically consistent predictive model. Our methodology includes a Graph Neural Network (GNN) capable of handling heterogeneous and temporally misaligned data. SUBTERRA models spatial and temporal relationships through a graph structure where nodes represent diverse environmental variables and sensor readings, and edges encode physical relationships such as water movement between soil layers. The framework incorporates 24-hour soil moisture dynamics generated by solving Richards' equation to enhance the model's learning capacity, especially for depths where sensor data are missing or sparse. In SUBTERRA, we discretize the soil profile into 1 cm increments and simulate hourly soil moisture changes based on boundary conditions derived from in-situ measurements, meteorological inputs, and soil hydraulic properties. These simulations provide training targets and input features for the GNN model, allowing it to learn from physically realistic patterns even in the absence of direct measurements. Out model, SUBTERRA integrates scientific models into machine learning through data augmentation as well as extending training objectives. Our approach incorporates knowledge-guided loss functions that penalize deviations from physical laws, there by enforcing scientific consistency in the model's predictions. This hybrid learning strategy enables SUBTERRA to deliver high-accuracy soil moisture estimates even under conditions of severe data sparsity and heterogeneity. (2) Designing and developingan intuitive cyberinfrastructure for scientists, growers, and other stake holders (Objective 4) As part of this project, we have developed a cyberinfrastructure for visualizing and analytics of soil moisture (https://soiltwin.org/soils-in-silico/). Our framework, HERMES provides a digital twin of soil at the CONUS scale, targeting the soil properties and moisture content estimations. HERMES is based on a fundamental assumption that no single data store is suited to all forms of soil-related information. Our design is based on polyglot persistence, distributing datasets across storage systems based on their fitness to specific data modalities. Structured data such as tabular results from soil surveys or metadata reside in relational databases (PostgreSQL), where schema and indexing are well understood. Semi-structured data, including observational records, boundaries, and vectors shapes are managed using document stores (MongoDB). Raster data, with its emphasis on spatial colocation and voluminous characteristics, is stored in a tiled form across distributed key-value stores (Cassandra and Redis) enabling both parallel access and scalable partitioning. To support unified analysis across these heterogeneous data systems, we introduce a spatiotemporal scoping mechanism that imposes a unifying metadata abstraction. Every data element, regardless of format or storage backend, has a spatiotemporal scope associated with it i.e., it is tagged with a well-defined spatial footprint and temporal window. This spatiotemporal scope serves as a bridge across systems allowing datasets to be located, aligned, and queried in a federated fashion. Temporal reconciliation is handled explicitly. We have designed a federated query evaluation model, in which sub-queries are pushed down to their respective data stores and executed natively. Each store's own language and indexing strategy are leveraged directly: SQL in PostgreSQL, MQL over BSON documents in MongoDB, and spatially constrained key-range lookups in systems such as Cassandra or Redis. The architecture also accommodates specialized indexing structures tailored to the capabilities of each store. These include B+ Trees for efficient range queries in relational systems, and 2d-sphere indexes in document stores to support spherical geometry operations that account for the curvature of the earth which is essential for distance operators and computations. This native execution avoids premature data movement and exploits the optimization strategies of each backend. However, the integration of such results is a challenge.
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Barram, Kassidy, Sangmi Lee Pallickara, and Shrideep Pallickara. "Scrybe: Enabling Programmatic Interfaces for Explorations Over Voluminous Spatiotemporal Data Collections." In 2024 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), pp. 248-257. IEEE, 2024.
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Fischer, Samantha C., Jeffrey D. Niemann, Joseph Scalia, Matthew D. Bullock, Holly E. Proulx, Boran Kim, Timothy R. Green, and Peter J. Grazaitis. "Assessing the influence of model inputs on performance of the EMT+ VS soil moisture downscaling model for a large foothills region in Northern Colorado." Journal of Hydrology 650 (2025): 132397.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Larrieu, Federico, Tyson OLeary, Sangmi Lee Pallickara, and Shrideep Pallickara. "Magellan: Enabling Effective Search Over Voluminous, High-dimensional Scientific Datasets." In 2024 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), pp. 218-227. IEEE, 2024.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2025
Citation:
Faruk, Tanjim Bin, Abdul Matin, Shrideep Pallickara, and Sangmi Lee Pallickara. "Accounting for Spatial Variability with the Histogram of Oriented Gradients Based Masking Improves Performance of Masked Autoencoder over Hyperspectral Satellite Imagery (Student Abstract)." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 28, pp. 29365-29367. 2025.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2025
Citation:
Khandelwal, Paahuni, Jeffrey D. Niemann, David J. Mulla, Shrideep Pallickara, and Sangmi Lee Pallickara. "Subterra: Estimating Soil Moisture at Root Zone Depths Using Science-Guided Learning." In 2025 IEEE Conference on Artificial Intelligence (CAI), pp. 328-335. IEEE, 2025.
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Bindner, Joseph R., Holly Proulx, Kevin Wickham, Jeffrey D. Niemann, Joseph Scalia IV, Timothy R. Green, and Peter J. Grazaitis. "Dependence of Soil Moisture and Strength on Topography and Vegetation Varies Within a SMAP Grid Cell." Hydrology 12, no. 2 (2025): 34.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Lewark, Everett, Matthew Young, Paahuni Khandelwal, Sangmi Lee Pallickara, and Shrideep Pallickara. "Periscope: A Framework for Visualizations of Multiresolution Spatiotemporal Data at Scale." In 2024 IEEE International Conference on Big Data (BigData), pp. 1373-1380. IEEE, 2024.
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2024
Citation:
Sahaar, Shukran A., and Jeffrey D. Niemann. "Estimating Rootzone Soil Moisture by Fusing Multiple Remote Sensing Products with Machine Learning." Remote Sensing 16, no. 19 (2024): 3699.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2025
Citation:
Hansen, Paige, Nathan Orwick, Kassidy Barram, Pierce Smith, Jay Breidt, Sangmi Lee Pallickara, and Shrideep Pallickara. "Archimedes: A Framework to Support Distributional Similarity Analysis over Arbitrary Spatiotemporal Scopes at Scale." In 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 215-225. IEEE, 2025.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2025
Citation:
Andrei Bachinin, Rupasree Dey, Paahuni Khandelwal, Sam Leuthold, M. Francesca Cotrufo, Shrideep Pallickara, and Sangmi Lee Pallickara, Science-Informed Multitask Transformer for Soil Property Prediction from FTIR Spectroscopy, (To appear) Proceedings of the IEEE eScience Conference, 2025.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2025
Citation:
Tanjim Faruk, Abdul Matin, Shrideep Pallickara, Sangmi Lee Pallickara, TerraMAE: Learning Spatial-Spectral Representations from Hyperspectral Earth Observation Data via Adaptive Masked Autoencoders, (To appear) Proceedings of the 33rd ACM SIGSPATIAL International Conference on Advances in Geospatial Information Systems (SIGSPATIAL 2025), Minneapolis, MN 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Paahuni Khandelwal, Large-scale Predictive Modeling for Spatiotemporally Evolving Phenomena. Doctoral Dissertation, Computer Science, Colorado State University, 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Frederic Larrieu, Enabling Effective Search Over Voluminous, High-dimensional Scientific Datasets, Masters Thesis, Computer Science, Colorado State University, 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Kati Patterson, Time series Analysis over Sparse, Non-stationary Datasets with Variational Mode Decomposition and Transfer Learning, Masters Thesis, Computer Science, Colorado State University, 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Kassidy Baram, Enabling Programmatic Interfaces for Voluminous Spatiotemporal Data Collections, Masters Thesis, Computer Science, Colorado State University, 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Matt Young, Rapid & Interactive Explorations of Voluminous Spatial Temporal Datasets, Masters Thesis, Computer Science, Colorado State University, 2025
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2025
Citation:
Tajim Faruk, Towards Generating A Pre-training Image Transformer Framework for Preserving Spatio-Spectral Properties in Hyperspectral Satellite Images, Masters Thesis, Computer Science, Colorado State University, 2025
- Type:
Websites
Status:
Other
Year Published:
2025
Citation:
https://soiltwin.org/soils-in-silico/
Interactive Web-bases visual analytics environment for soil moisture map over CONUS
|