Progress 09/01/24 to 08/31/25
Outputs Target Audience:During this reporting period, the project primarily reached internal research audiences including a postgraduate researcher and a postdoctoral researcher on the project team. These individuals received training and mentorship in synthetic data generation, graph convolutional networks, and GAN-based translation, supporting their technical and professional development. In addition, the project has prepared materials for broader dissemination through an accepted talk at the CANVAS 2025 meeting, which engaged the plant phenotyping and computational biology research communities. Thus, the target audiences reached include both the immediate project team and the broader scientific community preparing to benefit from the project's outputs. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? The project has created meaningful opportunities for training and professional development across different career stages. A newly joined PhD student has started their Graduate Research Assistantship in Fall 2025, where they are being introduced to generative adversarial networks and point cloud segmentation, laying the foundation for incorporating these methods into their dissertation research. This PhD student also presented a poster on this work at an internal UF AI conference and presented at CANVAS 2025 (an international research conference) a well. A postdoctoral researcher has contributed to advancing the methodological framework, gaining additional experience in interdisciplinary research and project management. All students participate in all project meetings and joint discussions across all aspects of the work providing exposure to and experience in project leadership and management. How have the results been disseminated to communities of interest?The results have been disseminated through the creation of a public GitHub repository, which provides access to source code, data processing workflows, and implementation details for reproducibility. We presented our work at CANVAS 2025 to share findings with the plant phenotyping and computational biology communities and at UF AI Days 2025. These efforts ensure both transparency and broad community impact.? What do you plan to do during the next reporting period to accomplish the goals? During the next reporting period, we plan to advance our methodology along three complementary directions, all aimed at improving segmentation accuracy and realism in root-soil CT point clouds. First, we will continue experimentation (which is in the initial stages) on adaptive neighborhood selection in Graph Convolutional Neural Networks (GCNNs). Different points require different neighborhood sizes depending on their context: interior points in dense regions are relatively homogeneous and can be segmented with smaller neighborhoods, while boundary points and small disjointed components are more challenging and demand larger contextual information. To address this, we are developing an adaptive strategy that allows each point to dynamically select its effective neighborhood size. Our current experiments evaluate two approaches: (i) target-neighbor scoring, where a score is generated for a point and an individual neighbor, and (ii) target-all neighbor scoring, where scores are generated for all K candidate neighbors and aggregated. The latter provides richer context by comparing neighbors to one another, helping the model identify which neighbors contribute most to accurate segmentation. These efforts are expected to improve segmentation quality in difficult regions such as fine root branches and soil-root boundaries. Second, we plan to incorporate priors on root thickness and length into the segmentation pipeline through a physics-informed loss function. Root segments are expected to fall within characteristic ranges of thickness and length, which vary with depth. Embedding these constraints into training will guide the network to generate predictions that are consistent with realistic root morphology. This approach could reduce false positives and false negatives by steering the model toward outputs that align with structural expectations, not just point-level accuracy. Third, we aim to improve the realism of synthetic-to-CT translations by developing enhanced loss functions for our GC-GAN. Current translations sometimes exhibit domain shift, where generated point clouds deviate from the density and structural patterns of real CT scans. By refining distribution matching losses, we expect to generate more realistic CT-like point clouds. This will reduce the gap between synthetic and real data, allowing synthetic data to contribute more effectively to segmentation training. ? Together, these three directions: adaptive neighborhoods, physics-informed constraints, and improved GAN-based translations, are designed to strengthen the accuracy and biological validity of segmentation, while reducing domain shift. Ultimately, these advances will enable more reliable extraction of root traits from soil cores, supporting high-throughput plant phenotyping.
Impacts What was accomplished under these goals?
Root Architecture Modeling CropRootBox, a root system architecture model built using the Cropbox framework in Julia, was used to create the synthetic root data used for training the model. To simulate switchgrass root architecture, the base model parameters were estimated from previous experiments on VS16 and WBC genotypes of Panicum virgatum. Parameters like root thickness were modified to better match the distributions seen in the CT scan data. Root growth was simulated with an hourly timestep and then exported as a 3-dimensional mesh. Virtual soil cores matching the dimensions of the real cores were taken from several points around the edge of the crown diameter of the simulated plant. Voxelization was used to transform the mesh into the 3D point cloud format needed for the training of the model. Modifications to the original model code include the addition of rhizomes as a class in the model to better capture the rhizomatous nature of switchgrass and allow for more complex root architecture. Because the CT scans lack the detail to capture them, very fine roots were removed from the simulation. Model informed GCNN Development We developed an architecture to integrate synthetic root architecture modeling data with real soil core CT scans of switchgrass. Root architecture modeling provides synthetic point clouds containing only root structures, while real CT data contains both roots and soil in complex arrangements. Manually labeling thousands of points in real soil cores is extremely time-consuming and impractical. Therefore, we rely on synthetic data to guide the segmentation process and reduce annotation effort, while still capturing the realism of soil-root interactions in natural cores. To achieve this integration, we first designed a Graph Convolution-based Generative Adversarial Network (GC-GAN). The GC-GAN takes synthetic root-only point clouds as input and translates them into realistic point clouds resembling CT scans of soil cores with both roots and soil. The advantage of these translated point clouds is that they have precise pointwise labels from synthetic data. Thus, every point in the translated cloud is annotated, which allows us to treat this data as strongly supervised. Second, we employed a Dynamic Graph Convolutional Neural Network (DGCNN) for segmentation. The DGCNN is trained jointly on both translated synthetic data and real CT scans. For the real CT scans, we only assume weak supervision, where labels provide approximate localization of roots but not precise pointwise boundaries. For the translated synthetic data, however, strong supervision is available, which encourages the network to learn fine-grained segmentation. By combining both sources, the model learns to generalize across noisy, heterogeneous distributions of real data. During inference, the joint training strategy enables the network to generate more accurate segmentation maps of real soil cores. We conducted evaluations to compare three training strategies: 1. Synthetic-only training: Models trained solely on synthetic data performed poorly on real CT scans segmentation due to slight domain shift. Although GC-GAN tries to generate data as real as possible, the distributions of real and generated CT scans are not identical. 2. Real-only training (weak supervision): Training only on real CT scans with weak labels localized roots coarsely but lacked fine-grained accuracy. The weak supervision does not properly constrain point-level predictions, leading to under-segmentation of roots and inclusion of soil as false positives. 3. Joint synthetic + real training (proposed approach): Combining synthetic translated data and weakly labeled real CT scans produced the best results. Synthetic data enforced point-level structural detail, while real data aligned the model to the true distribution of soil-root patterns. This combination led to superior segmentation accuracy compared to either source alone. Beyond the integration of data sources, we faced several challenges in developing GC-GAN for realistic translation. The translation process requires not only embedding roots in soil but also generating soil points that preserve the density and distribution patterns observed in real CT scans. Specifically, soil points must surround roots in a natural configuration that mimics cores extracted from the ground. A naïve generation of soil points often results in unrealistic densities and uneven spatial distributions. To address this, we adopted a candidate soil point selection mechanism. Initially, the background is filled with candidate soil points, subject to minimum separation constraints to ensure physically plausible density. GC-GAN then applies a differentiable argmax mechanism to select the most plausible soil points from this candidate pool. This mechanism allows gradient-based training while enforcing discrete soil point selection. The discriminator of GC-GAN evaluates the realism of translated cores by comparing them to true CT scans, thereby pushing the generator to learn physically consistent soil placement. This combination of structured candidate generation and adversarial refinement was key to producing realistic soil-root point clouds. Overall, so far, our framework demonstrates that combining synthetic and real data provides a powerful strategy for root phenotyping in soil cores. Synthetic data offers scalability and precise labels, while real CT scans provide realism. By bridging the gap with GC-GAN and leveraging the segmentation power of DGCNN, we achieve significantly improved accuracy over traditional approaches. The methodology not only advances automated root segmentation but also lays the foundation for high-throughput phenotyping, enabling the extraction of critical traits such as root length, branching, and density from CT scans. These traits can in turn accelerate research in plant biology, crop breeding, and sustainable agriculture.
Publications
|