Source: UNIV OF PENNSYLVANIA submitted to NRP
NRI: ROBOTICS, SCIENCE AND TECHNOLOGY FOR FORESTRY
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1027761
Grant No.
2022-67021-36856
Cumulative Award Amt.
$1,000,000.00
Proposal No.
2021-10933
Multistate No.
(N/A)
Project Start Date
Mar 1, 2022
Project End Date
Feb 28, 2026
Grant Year
2022
Program Code
[A7301]- National Robotics Initiative
Recipient Organization
UNIV OF PENNSYLVANIA
(N/A)
PHILADELPHIA,PA 19104
Performing Department
Electrical and Systems Engineering
Non Technical Summary
"If a tree falls in the forest, and there is no one to hear it, does it make a sound?" While the question is decidedly philosophical, it conveys our goal here perfectly. But we also want to know a few other things, e.g., how large the tree was, what kind it was, and how much would it have grown in the future. We want to be able to answer such questions for the entire forest. Doing so is the key to understanding the carbon ecosystem and its impact on climate change, to advance forest science and to provide decision making tools for the $325B global forestry industry. To do so, it is necessary to (a) build detailed maps that can help understand a region as large as an entire forest and update these maps regularly to capture changes in the forest, (b) develop a fine-grained understanding of the shape and size of the vegetation in the forest, a catalogue of the various species of trees and shrubs in the forest, and (c) develop new methods to understand whether there is sufficient diversity in the forest, the health of the forest and its grown rate.We are a team of roboticists and machine learning researchers from the University of Pennsylvania and forestry experts from the Virginia Polytechnic Institute and State University. We envision a team of unmanned aerial vehicles (UAVs/drones), each equipped with sensors like high-resolution cameras that can build a map a given region quickly as complex as a forest. This team of drones, aided by forest scientist can build maps of areas as large as 1000 acres by flying under the forest canopy. We envision that these drones will be able to produce fine-grained maps of small, user-chosen, areas of forest, e.g., catalogue the diversity of vegetation, identify if there are diseases or pest infectations. Such fine-grained mapping will be conducted using new methods that demonstrate a "curious but busy bee explorer", i.e., the drone will explore an unknown region and once it obtains enough information from this region, automatically move on to a new region. This system will be demonstrated via experiments in the Wharton State Forests and Virginia State Forests for improved decision making in forestry, e.g., logging strategies in pine plantations, mapping individual tree locations, species and physiology.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
12306992080100%
Goals / Objectives
Our vision and summary of proposed researchWe envision a system wherein a scientist guides a team of UAVs to conduct research. Each UAV is equipped with a 3D LiDAR, high resolution cameras, and onboard computation, and can function fully autonomously without direct human intervention in the absence of GPS. The scientist delineates large areas (up to 250 acres) using over-canopy data for exploration and mapping. The UAV team leverages our new planning algorithms for active information gathering that utilize cross-view collaboration of different UAVs to fit geometric and semantic models of the overstory. The scientist can also transport this UAV team to a different region after swapping out batteries and repeat this exploration; we will develop a large-scale autonomy stack that can operate across 1000 acres for this purpose. The second scale of operation is fine-grained mapping where the focus is on a smaller region (about 5 acres). We will gather rich geometric information (tree location, height, detailed stem profile, features such as knots and understory volume) and semantic information (species of all vegetation, visual indicators of insect infestation), using new methods for fusing RGB and point-cloud data from the scene and novel theoretical and algorithmic tools for active semantic exploration.The scientist performs such data-gathering periodically, say once in each season. Back at the University, she proposes data-driven policy decisions, e.g., logging strategies for the timber industry. Using under-canopy UAV-generated maps, she can make detailed analyses of locations and species of individual crop trees, understand aspects that are important for management such as first-order branching, and develop a quantitative understanding of non-crop trees and understory vegetation for informed silviculture prescriptions. Our experiments will be performed in the Wharton State Forest in New Jersey and Virginia State Forests.Intellectual MeritOur research agenda is developed from the ground up to address challenges in forest science and envisions large-scale, detailed, on-demand data gathering for decision making. This research will advance the state of the art in large-scale mapping with size, weight and power (SWaP) constrained platforms, active exploration and mapping with multiple heterogeneous robots/sensors, semantic mapping of highly unstructured environments using multi-modal data, and active semantic perception. The research will lead to a framework for leveraging synergies between humans and robots, allowing human experts to focus on understanding the state of the forest (trees, understory) and relying on robots to make high-resolution measurements that uncover spatial patterns which would be tedious or impossible to make at scale.Broader ImpactsOur efforts will have a significant and immediate impact for the large number of stakeholders in forestry with interests in yield estimation, inventory management, forecasting and sustainability. It also has implications for precision agriculture more broadly and our research will strengthen rural economies and increase the competitiveness of US agriculture. We will publicly release data collected from this project as "ForestNet" which will further spur the strong interest in citizen scientists for data-science problems related to nature and climate change. Our collaborations with the Forest Modeling Research Cooperative (FMRC) will ensure impact on policy making while our connections to companies such as TreeSwift will translate our basic research into commercial impact. We will work with the Philadelphia School District, a primarily low-income and underrepresented minority population, to develop school curricula that involve agriculture and nature. We will also engage the general public via outreach to forest growers (56% of forests in the US are privately owned) and lab demonstrations of UAV technology given to 1,200 K-12 students per year.Research ObjectivesThe objectives of this research are as follows.Thrust A: Large-scale mapping using multiple UAVsA.1 Active mapping to gather actionable information over large areasA.2 Scaling up the autonomy stack to map 10 x larger areasA.3 Mapping with heterogeneous robotsThrust B: Fine-grained semantic understanding of unstructured environmentsB.1 Combining visual and point-cloud data for building representations of the scene suitable tailored for decision making in forestryB.2 Active semantic scene understandingB.3 Scalable annotation of forestry dataThrust C: Pairing human-collected ground measurements with UAV dataC.1 Application in managed loblolly pine plantationsC.2 Applications in diverse forest cover typesWork PlanPenn PIs will co-direct and develop the research agenda of Thrusts A (large-scale mapping using UAVs) and Thrust B (fine-grained semantic scene understanding) while the Virginia Tech PIs will co-direct the research agenda on forest measurements in Thrust C. See the Collaboration Plan for more details. One Postdoctoral Researcher and two graduate students (one each from Penn and Virginia Tech) will work on this project. We envision two PhD theses from this project, one on "active mapping for semantic scene understanding" and another on "allometric and understory measurements using point-cloud and spectral imaging data". Data will be collected from the Wharton State Forest and research pine plantations in Virginia State Forests.Year 1: Set up three-level (over-canopy and under-canopy flight, ground measurements) data collection. Characterizing uncertainty as a function of sampling (A.1). Train models for fusing RGB and point-cloud data (B.1). Data collection in managed pine plantations (C.1) and annotation (B.3). Year 2: Autonomous large-scale flight combining coverage (raster pattern) with active mapping (A.1). Develop theory for fine-grained active mapping (B.2). Study branching and understory in managed pine plantations (C.1) Year 3: Large-scale autonomous flight (A.2) with multiple UAVs (A.3). Demonstrate active semantic mapping (B.2). Study diverse forest types (C.2) Year 4: Autonomy stack for 250 acre flight (A.2) with multiple UAVs (A.3). Integrated demonstration of the full system and evaluation. Evaluation via external collaborators in the Robotics Group at the Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia.
Project Methods
Thrust A: Large-scale mapping using multiple UAVsWe will build a heterogeneous team consisting of UAVs capable of the over-the-canopy flights, UAVs that can fly under the canopy, and finally a human scientist that may be able to perform inspection tasks and deploy UAVs from the back of a truck. Over-canopy flights will enable fast survey of the environment, allowing a scientist to delineate areas for further exploration and mapping. Under-canopy robots deployed by the scientist can use either trajectories prescribed by them or predetermined coverage trajectories that are augmented by active information gathering planning algorithms. Multiple under-canopy robots can also collaborate with each other and rendezvous with the truck driven by the scientist for recharging.A.1 Active mapping to gather actionable information over large areasWe will use our expertise in autonomous under-canopy flights and object tracking to actively map large-scale forest environments. We will use over-canopy images to build a density model using a Gaussian Process. This density distribution model will be used to plan multi-scale paths for under-canopy flights using reinforcement learning. We will perform large-scale semantic mapping and design measurement models that will have both geometric properties and semantic properties using algorithms for active information gathering.A.2 Scaling up the autonomy stack to map 10x larger areasWe propose a new hierarchical approach to mapping and planning for autonomous flight. The high-level map for path planning will have a dimension of up to several kilometers, and consist of geometry and semantics of each object. The low-level map will be robot-centric and consist of finely discretized voxel grids. A mid-level map will have a coarsely discretized representation and enable efficient and collision-free long-horizon planning.A.3 Heterogeneous robotsOur team of robots will consist of ground vehicles that transport the UAVs between launch and recovery sites. We will enforce constraints that come from the forest-roads transport truck while implementing this multi-UAV system. Another key innovation that we will add here is to have a constraint of some user-determined overlap on the area mapped by each UAV to make the system resilient to communication failures.Thrust B: Fine-grained semantic understanding of unstructured environmentsThis thrust investigates new methods for active exploration to obtain a fine-grained semantic understanding of the scene. We will develop new methods to fuse RGB data with LiDAR-based point-clouds that are designed to work well in highly unstructured environments. We will also develop theoretical and algorithmic tools for recovering actionable information from the scene, i.e., actively exploring the scene so as to recover an accurate estimate of forestry-relevant quantities like tree volume and species.B.1 Combining visual and point-cloud data for building representations of the scene suitable tailored for decision making in forestryWe will design methods to fusing RGB and point-cloud data and build deep learning models for predicting tree species, physiology (trunk, branches, crown) and tree form (excurrent/decurrent/shrub, which are typically available). We will estimate (i) stem and branch volume (high economic value), (ii) tree-form and stem taper (to compare with/refine existing models), (iii) biomass of undergrowth (estimate carbon capture better), (iv) crown growth of trees and undergrowth (extremely labor intensive and imprecise today), and (v) knots and insect infestation.B.2 Active semantic scene understandingWe seek to build a representation that is (i) rich enough to accurately solve the task, (ii) but not any more complex than necessary. Such a representation should be learned efficiently, i.e., (iii) it should inform how the robot should move to reduce the uncertainty of the task. We will formalize and instantiate this idea using information theory.B.3 Scalable annotation of forestry dataWe will upload our data to iNaturalist to get ground-truth annotations of the images. This will help create a large corpus of forestry specific data. We will build a prototype annotation system for LiDAR and visual data for forestry researchers.Thrust C: Pairing human-collected ground measurements with UAV dataOur goal is to pair measurements that humans excel in capturing (e.g., species) with information that robots excel in capturing (e.g., complex spatial patterns). Advanced forest monitoring capabilities will result in data that facilitates better management decisions for the diverse forest cover types.C.1 Application in managed loblolly pine plantationsSites with recent thinning operations will be chosen with consideration given to homogeneity of site productivity, homogeneous post-thin conditions, and availability of historical metadata. Four treatments will be imposed at each location: (i) Thinning only control, (ii) thinning followed by forest fertilization, (iii) thinning followed by non-crop tree chemical vegetation control, and (iv) thinning followed by fertilization and vegetation control. Spatial locations of each tree will be determined and metadata relating to soil, topography, and previous management history will be recorded. At each measurement interval, common allometric measurements will be obtained. Additionally, non-crop tree competing vegetation canopy cover will be quantified.C.2 Applications in diverse forest cover typesAssessing characteristics of all forest cover types is important for helping managers maximize the diverse benefits that forests provide.1. To what degree can point cloud and RGB data collected by UAVs be used to detect, measure common allometric values (e.g. DBH and total height) and spatially map individual overstory trees in diverse, natural forests?2. To what degree can UAV point cloud and RGB data be used to classify and quantify understory vegetation in diverse, natural forest cover types?3. In riparian stream-side management zones (SMZs) interspersed in managed pine plantations, to what degree can UAV point cloud and RGB data be used to quantify the characteristics of SMZs outlined in for Virginia (VDOF 2011)?EvaluationOur UAV platforms range from small (200 gm) to large (3.5 kg) depending upon the sensor and computer. The OSI-64 3D LIDAR can provide accurate measurements with a large field-of-view, the three cameras will be used to capture high-resolution images. Our robust and performant autonomy stack has been in development for several years and consists of state estimation, planning and control modules. We have developed a rig for a human operator to control the UAVs outdoors which will also be used in this project.Standard benchmarks in computer vision (KITTI, NYU-Depth, MS COCO) will be used to prototype sensor fusion models. The iNaturalist dataset will be used for developing the classification pipeline using pre-trained models on Imagenet. We will first develop semantic mapping first using Flight Goggles [90] and then use our ROS simulator.Allometric measurements (trunk and canopy physiology) from UAV flights and machine learning will be compared to that from handheld terrestrial laser scanning, ground-based standing-tree measurements, and felled tree measurements. We will compare this to existing data from Burkhart's group of measurements taken in FMRC plantations. Species classification (both crop and non-crop tree) will be cross-checked by random sampling by a human. Ground-based measurements will be considered "truth" although we do realize that these may contain some error.We will collaborate (share data and algorithms) with the Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia to benefit from their work on navigation in forest environments with ground vehicles for biodiversity surveys and forest health monitoring.

Progress 03/01/24 to 02/28/25

Outputs
Target Audience:Members of the Forest Modeling Research Cooperative (FMRC) based at Virginia Tech have been regularly updated on progress of this project. The FMRC is an industrial research cooperative and the methods and results from this project may be widely implemented across many thousands of acres of industrial timberlands. Research talks were given on this material to faculty of the General Robotics Automation Sensing and Perception Laboratory, undergraduate, Master's and doctoral students students at Penn, lab tours and visits. Ankit Prabhu, who is a Research Assistant at the University of Pennsylvania, gave a research seminar at the annual FMRC meeting where he interacted with stakeholders in the forestry industry on how to incoporate robotics technology. PhD student Xu Liu, Ankit Prabhu and Jizhou Li (Master's student at Penn), made regular visits to an apple orchard near Philadelphia to collect data on how the appearance and size of apples changes over time as they grow (early spring through the summer). These efforts have been crucial to teach us the critical problems to focus upon, e.g., mapping unstructured trees and plants. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?This project is currently training the following undergraduate, Master's and doctoral students. 1. Xu Liu (Mechanical Engineering PhD, just graduated, postdoc at Stanford now) 2. Dexter Ong (Computer Science PhD, 3rd year) 3. Ankit Prabhu (Resarch Assistant) 5. Siming He (Computer Science undergrad senior), 6. Fernando Caldera (6th year PhD), 7. Derek Chang (2nd year Masters, just graduated), and 8. Zach Osman (2nd year Master's student). A 1st yearundergraduate student, Yazlin Moujalled from the University of Maryland Baltimore County,was also involved in this research as a part of her REU. She worked on developing a system for identifying different species of trees, plants, weeds using large vision-language models. These students have engaged in research (which includes field experiments in the Virginia and Wharton State forests), gathering and processing data from the hardware platform as well as building the hardware platform for hand-held mapping of trees and autonomous flight. A number of simulation experiments were conducted before these efforts to fine-tune our methods and explore new algorithms. In addition to this, these students have also taken part in weekly group meetings (joint w/ groups of Vijay Kumar, Pratik Chaudhari and Virginia Tech PI Patrick Corey Green). How have the results been disseminated to communities of interest?These results were presented at conferences (ICRA, ACC, CVPR) over Summer 2024, some others will be presented at ICRA 2025. Some of the team members are also organizing a workshp on Robotics for Agricultureand Forestry at ICRA 2025. A major journal paper is being written up on these results. Excerpts from these results were presented during lab tours and visits, e.g., to local high-school, to staff and administrative personnel from Ivy league colleges during their annual visit to Penn. What do you plan to do during the next reporting period to accomplish the goals?The key things that we will focus on in the coming year are: Finish our analysis on the collected forestry data to build a system that can give photorealistic maps of up to 2 sq. km areas, it allows a user to click on any point in this region and visualize the geometry of the tree, obtain statistics such as (volume of wood and species of trees). This system will also be able to summarize these statistics over large regions. We expect to demonstrate this system and open it up to the general public this year. We have built a new drone with better cameras and an onbard GPU. This drone can give us more detailed observations higher up in the forest canopy because we can fly closer to dense branches. Over the last year, we have developed the ability to do experiments with ground and aerial vehicles together. We are currently instrumenting a legged robot that can traverse forest terrain much more easily. This robot will be used for some of the field experiments. We will refine our techniques to model the changes in appearance of plants/trees over time. We have seen some promising results on apple orchard data. We will now translate these results to trees in a forest across seasons.

Impacts
What was accomplished under these goals? Large Scale Mapping (Thrust A) Forest Mapping Current forestry measurements use low-data-rate optical sensors (small LiDARs) and heavy manual involvement (e.g., walking across large areas). We have shown that 360 cameras offer a much cheaper alternative, providing large amounts of rich, visual data. Over the past year, we collected large datasets from a research forest at Virginia Tech (where we have ground truth data from collaborators) and outdoor parks near Philadelphia. We gathered both 3D LiDAR (Ouster) data and 360-degree 4K RGB camera data from a manually flown drone and human-carried sensor rig. We processed large parts of this data into point clouds (representing tree trunks, branches, leaves, etc.) using both LiDAR and RGB data. We developed new techniques for synthesizing new views, semantic segmentation, and built a pipeline to extract individual trees. We combined this data with inertial sensors (accelerometers and gyroscopes) to compute statistics like diameter at breast height and compare it to LiDAR-based and human-collected measurements. Our results suggest we can recover tree diameter with about 15% error using only RGB data. These techniques could shift how forestry data is collected. From NeRFs to Gaussian Splats, and Back In robotics, with limited (typically ego-centric) views, parametric representations like neural radiance fields (NeRFs) generalize better than non-parametric ones like Gaussian splatting (GS) for views different from the training data. However, GS renders much faster than NeRFs. We developed a procedure to convert between the two, achieving the benefits of both: NeRFs provide superior PSNR, SSIM, and LPIPS on dissimilar views, and GS offers real-time rendering with easy representation modification. The computational cost of these conversions is minimal compared to training both from scratch. 4D Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset Automated, persistent monitoring of orchards at the tree or fruit level maximizes crop yield and optimizes resources like water, fertilizers, and pesticides. We present a 4D spatio-temporal metric-semantic mapping method that fuses LiDAR, RGB, and IMU data to track fruits in an orchard over their growth season. A LiDAR-RGB fusion module segments fruits with a deep neural network and tracks them using the Hungarian Assignment algorithm. The 4D data association module aligns data from different growth stages and tracks fruits spatially and temporally, providing information like fruit counts, sizes, and positions. We demonstrate the accuracy of our method with real orchard data under natural, uncontrolled conditions with seasonal variations. We achieve 3.1% error in fruit count estimation across 1790 fruits and 60 trees, and a 1.1 cm mean error in size estimation. The datasets, including LiDAR, RGB, and IMU data for five fruit species, will be made publicly available at: https://4d-metric-semantic-mapping.org/ Active Perception (Thrust B) An Active Perception Game for Robust Information Gathering Active perception selects future viewpoints using an estimate of information gain. An inaccurate estimate can be detrimental in critical situations (e.g., locating a person in distress quickly). However, true information gain can only be calculated post hoc. We present a method for estimating the discrepancy between the predicted and actual information gain. By analyzing the relationship between active perception and estimation error in a game-theoretic setting, we developed an online approach that reduces sub-optimality and achieves sub-linear regret in estimating true information gain. Our experiments, using a range of environments, robotic platforms, and perception data, show our approach reduces information gain errors by 42%, increases information gain by 7%, improves PSNR by 5%, and enhances semantic accuracy by 6%. In real-world experiments with a Jackal ground robot, the approach demonstrates complex trajectories to explore occluded regions. Active Scout: Multi-Target Tracking Using Neural Radiance Fields in Dense Urban Environments We study pursuit-evasion games in highly occluded urban environments (e.g., tall buildings) where a scout (quadrotor) tracks multiple dynamic targets on the ground. We show that a neural radiance field (NeRF) can be built online using RGB and depth images from different vantage points, allowing for information gain calculation to track targets and explore unknown areas. We demonstrate this with a custom-built simulator using OpenStreetMaps data from Philadelphia and New York City, showing we can locate 20 stationary targets in 300 steps. For dynamic targets hiding behind occlusions, our approach maintains a tracking error of 200m, while a greedy baseline can have a 600m error. Our scout's behavior adapts over time, switching attention between targets as the NeRF representation improves. Forestry Research (Thrust C) PhD student Nitant Rai at Virginia Tech is finalizing his research plan with the following objectives for 2025-2026: Using UAS hyperspectral data and LiDAR point clouds, assess accuracy in detecting seedlings in loblolly pine forests. Investigating the robustness of predictive models for forest structure with varying LiDAR resolutions. Developing new methods to use parametric stem taper models to predict tree volumes from incomplete point cloud profiles. Accomplishments: Dissemination of research findings in the Forest Modeling Research Cooperative 2024 Annual Report. Presentation of research findings at the Forest Modeling Research Cooperative 2024 Annual Meeting. Preliminary data collection for seedling detection in the Appomattox-Buckingham State Forest. Initial development of methods to assess the robustness of forest structure metrics using existing point clouds and plot measurements.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: He, Siming, Zach Osman, and Pratik Chaudhari. "From NeRFs to Gaussian Splats, and Back." CVPR Workshop arXiv:2405.09717 (2024).
  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: He, Siming, Yuezhan Tao, Igor Spasojevic, Vijay Kumar, and Pratik Chaudhari. "An Active Perception Game for Robust Autonomous Exploration." ICRA 2025.
  • Type: Conference Papers and Presentations Status: Awaiting Publication Year Published: 2024 Citation: Lei, Jiuzhou, Ankit Prabhu, Xu Liu, Fernando Cladera, Mehrad Mortazavi, Reza Ehsani, Pratik Chaudhari, and Vijay Kumar. "4D Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset." arXiv preprint arXiv:2409.19786 (2024).


Progress 03/01/23 to 02/29/24

Outputs
Target Audience:Members of the Forest Modeling Research Cooperative (FMRC) based at Virginia Tech have been regularly updated on progress of this project. The FMRC is an industrial research cooperative and the methods and results from this project may be widely implemented across many thousands of acres of industrial timberlands. Research talks were given on this material to faculty of the General Robotics Automation Sensing and Perception Laboratory, undergraduate, Master's and doctoral students students at Penn, lab tours and visits. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?This project is currently training the following undergraduate, Master's and doctoral students. 1. Xu Liu (Mechanical Engineering PhD, 5th year) 2. Dexter Ong (Computer Science PhD, 2nd year) 3. Yifei Shao (Computer Science PhD, 2nd year) 4. Ankit Prabhu (Resarch Assistant) 5. Siming He (Computer Science undergrad senior), 6. Fernando Caldera (5th year PhD), 7. Derek Chang (2nd year Masters). Some undergaduate students were also invovled in this research in Summer 2023, Alicia Sun (1st year), Alan Zhu (1st year). As a part of this project, these students have engaged in research (which includes field experiments in the Virginia and Wharton State forests), gathering and processing data from the hardware platform as well as building the hardware platform for hand-held mapping of trees and autonomous flight. A number of simulation experiments were conducted before these efforts to fine-tune our methods and explore new algorithms. In addition to this, these students have also taken part in weekly group meetings (joint w/ groups of Vijay Kumar, Pratik Chaudhari and Virginia Tech PI Patrick Corey Green). How have the results been disseminated to communities of interest?These results will be published and presented at conferences (ICRA and ACC) over Summer 2024. Two journal papers (one published, and one in progress) have resulted from this work. Excerpts from these results were presented during lab tours and visits, e.g., to local high-school, to staff and administrative personnel from Ivy league colleges during their annual visit to Penn. What do you plan to do during the next reporting period to accomplish the goals?In the next year, we will focus on the following action items. - Use high resolution visual data (2x fisheye cameras that together record 8K frames at 30 fps) to build photorealistic maps of trees. Compare these maps to catalogues maintained by the Institute Architect at Penn to undertand the different kinds of trees on campus. - Instrument a UAV with this visual system and collect data using manual and autonomous flights in New Jersey and Virginia forests. - Demonstrate multi-UAV flight and communications in a forest. - Develop a system that can recover fine-grained details of the trees, branching patterns and shrubs in large forested areas using a combination of neural radiance fields and featues of foundation models such as CLIP that are grounded in the physical scene. - Develop procedures to model changes in the appearance of trees across seasons and utilize this information for localization.

Impacts
What was accomplished under these goals? We have made progress on the three main thrusts in the following directions. UAVs for forestry: Metric-semantic mapping and diameter estimation with autonomous aerial robots (Thrust A, C) To properly monitor the growth of forests and administer effective methods for their cultivation, forestry researchers require access to quantitative metrics such as diameter at breast height and stem taper profile of trees. These metrics are tedious and labor-intensive to measure by hand, especially at the scale of vast forests with thick undergrowth. Autonomous mobile robots can help to scale up such operations and provide an efficient method to capture the data. We present a set of algorithms for autonomous navigation and fine-grained metric-semantic mapping with a team of aerial robots in under-canopy forest environments. Our autonomous UAV system has 3D flight capabilities and relies only on a LIDAR and an IMU for state estimation and mapping. This allows each robot to accurately navigate in challenging forest environments with drastic terrain changes regardless of illumination conditions. Our deep-learning-driven fine-grained metric-semantic mapping module is capable of detecting and extracting detailed information such as the position, orientation, and stem taper profile of trees. This map of tree trunks is represented as a set of sparse cylinder models. Our semantic place recognition module leverages this sparse representation to efficiently estimate the relative transformation between multiple robots, and merge their information to build a globally consistent large-scale map. This ultimately allows us to scale up operations with multiple robots. Our system is able to achieve a mean absolute error of 1.45 cm for diameter estimation and 13.2 cm for relative position estimation between a pair of robots after place recognition and map merging TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Mapping of Trees in Forests and Orchards (Thrust C) Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics dataset for precision agriculture and forestry addressing the counting and mapping of trees in forestry and orchards. TreeScope provides LiDAR data from agricultural environments collected with robotics platforms, such as UAV and mobile robot platforms carried by vehicles and human operators. In the first release of this dataset, we provide ground-truth data with over 1,800 manually annotated semantic labels for tree stems and field-measured tree diameters. We share benchmark scripts for these tasks that researchers may use to evaluate the accuracy of their algorithms. Finally, we run our open-source diameter estimation and off-the-shelf semantic segmentation algorithms and share our baseline results. VEMS-SLAM: Versatile, Efficient, Metric-Semantic SLAM for Multi-Robot Navigation and Exploration (Thrust A) In this paper, we propose a generic and versatile metric-semantic SLAM (MS-SLAM) framework for autonomous exploration with heterogeneous robot teams. The proposed MS-SLAM framework supports different types of sensors including LiDARs and RGBD cameras.It enables a heterogeneous team of robots to autonomously explore 3D environments featuring both indoor and outdoor areas without relying on GPS.The MS-SLAM framework utilizes a hierarchical metric-semantic representation of the environment, ranging from high-level sparse semantic maps to low-level dense geometric maps. For the purpose of facilitating autonomous exploration and navigation in large-scale environments, we proposed using a high-level map representation that is sparse and semantically meaningful, consisting of explicitly modeled object landmarks. Building upon such a representation, we use our customized measurement models for each type of semantic objects and incorporate them into a factor graph, so that the robot can efficiently optimize over the semantic map and its poses during the entire mission.We leverage the sparsity, informativeness, and viewpoint invariance of this representation and propose a direct but effective semantic place recognition algorithm to detect both intra-robot and inter-robot loop closures and correct odometry drifts and merge maps from multiple robots.We design a multi-robot collaboration framework that keeps track of the robot's own metric-semantic observations and the observations from other robots within communication range.This framework also incorporates the proposed place recognition algorithm, so that the observations from the other robots can be merged to construct a common metric-semantic factor graph whenever a valid loop closure is detected.Due to the sparsity and informativeness of the high-level semantic map, our system runs in real time on each robot in a decentralized manner and opportunistically leverages communication.We integrate and deploy our proposed MS-SLAM framework on three types of aerial and ground robots that utilize either RGBD cameras or LiDARs, and demonstrate that the team of robots can explore a 3D large-scale indoor and outdoor environments. We show, through both theoretical analysis and empirical evidence, that our proposed MS-SLAM framework has low computational and memory demand. % even with large teams of robots autonomously exploring large-scale environments. In addition, we benchmark our system against state-of-the-art metric-semantic SLAM algorithms. Active Perception using Neural Radiance Fields (Thrust B) We study active perception from first principles to argue that an autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones. Doing so requires (a) a representation of the scene that summarizes past observations and the ability to update this representation to incorporate new observations (state estimation and mapping), (b) the ability to synthesize new observations of the scene (a generative model), and (c) the ability to select control trajectories that maximize predictive information (planning). This motivates a neural radiance field (NeRF)-like representation which captures photometric, geometric and semantic properties of the scene grounded. This representation is well-suited to synthesizing new observations from different viewpoints. And thereby, a sampling-based planner can be used to calculate the predictive information from synthetic observations along dynamically-feasible trajectories. We use active perception for exploring cluttered indoor environments and employ a notion of semantic uncertainty to check for the successful completion of an exploration task. We demonstrate these ideas via simulation in realistic 3D indoor environments.

Publications

  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Prabhu, Ankit, Xu Liu, Igor Spasojevic, Yuwei Wu, Yifei Shao, Dexter Ong, Jiuzhou Lei, Patrick Corey Green, Pratik Chaudhari, and Vijay Kumar. "UAVs for forestry: Metric-semantic mapping and diameter estimation with autonomous aerial robots." Mechanical Systems and Signal Processing 208 (2024): 111050.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: He, Siming, Christopher D. Hsu, Dexter Ong, Yifei Simon Shao, and Pratik Chaudhari. "Active Perception using Neural Radiance Fields." American Control Conference, 2024.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Cheng, Derek, Fernando Cladera Ojeda, Ankit Prabhu, Xu Liu, Alan Zhu, Patrick Corey Green, Reza Ehsani, Pratik Chaudhari, and Vijay Kumar. "TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Mapping of Trees in Forests and Orchards." ICRA 2024
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Shao, Yifei Simon, Yuwei Wu, Laura Jarin-Lipschitz, Pratik Chaudhari, and Vijay Kumar. "Design and Evaluation of Motion Planners for Quadrotors." ICRA 2024.


Progress 03/01/22 to 02/28/23

Outputs
Target Audience:Members of the Forest Modeling Research Cooperative (FMRC) based at Virginia Tech have been updated on project progress. The FMRC is an industrial research cooperative and the methods and results from this project may be widely implemented across many thousands of acres of industrial timberlands. Research talks were given on this material to faculty of the General Robotics Automation Sensing and Perception Laboratory, undergraduate, Master's and doctoral students students at Penn, lab tours and visits, e.g., to 28 students from a local high-school. Changes/Problems:Patrick Corey Green assumed role of the Principle Investigator at Virginia Tech. This was approved from the prime awardee University of Pennsylvania and the Program Manager at USDA/NIFA. What opportunities for training and professional development has the project provided?This project is currently training the following undergraduate, Master's and doctoral students. Xu Liu (Mechanical Engineering PhD, 4th year) Dexter Ong (Computer Science PhD, 1st year) Yifei Shao (Computer Science PhD, 1st year) Ankit Prabhu (Robotics Master's 2nd year) Yingtian Tang (Computer Science Master's, 2nd year) Daiwei Chen (Electrical Engineering Master's, 2nd year) Siming He (Computer Science undergrad) Alan Zhao (Computer Science undergrad) As a part of this project, these students have engaged in research (which includes field experiments in the Virginia and Wharton State forests), gathering and processing data from the hardware platform as well as building the hardware platform for hand-held mapping of trees and autonomous flight. A number of simulation experiments were conducted before these efforts to fine-tune our methods and explore new algorithms. In addition to this, these students have also taken part in weekly group meetings (joint w/ groups of Vijay Kumar, Pratik Chaudhari and Virginia Tech PI Patrick Corey Green) How have the results been disseminated to communities of interest?Excerpts from these results were presented during lab tours and visits, e.g., to 28 students from a local high-school, to staff and administrative personnel from Ivy league colleges during their annual visit to Penn. What do you plan to do during the next reporting period to accomplish the goals?We do not anticipate any changes to the intellectual agenda discussed in the original proposal. In the next year, we will focus on the following action items. Improve the performance of LiDAR odometry to enable robust autonomous flight in forests over long time horizons. Build solutions to combine the data obtained from multiple drones flying simultaneously to map a large part of the forest; this involves research on scene understanding, active exploration and bundle adjustment. Develop a pipeline for species identification that can work for the vegetation expected in the Virginia State Forests and develop deep learning-based object detection that can accurately identify the species of the tree using multiple images of the leaves and tree bark across a broad range of lighting condtions. This also includes building a pipeline for annotation of such data. Apply the methodology for mapping and estimation of volume broadly across planted loblolly pine forests. In addition to continued efforts at accurately characterizing the main stem, begin to assess tree crown metrics including crown ratio, crown length, crown volume, and crown area. Additionally, develop methods to quantify competing understory vegetation in commercial timberlands.

Impacts
What was accomplished under these goals? Mapping unstructured environments and autonomous flight (Thrust A) We fine-tuned the autonomous platform to fly autonomously using LiDAR-based odometry over large time-horizons. Experiments were conducted on the Penn campus, at the Virginia State Forest and the Wharton State Forest and in our longest experiments, we were able to fly (a) non-autonomously in open areas for close to 5km with a drift in the state estimate of less than a meter, (b) autonomously in uncluttered areas with few branches, (c) non-autonomously in more cluttered areas with more branches, and (d) flying at both day and night times. These experiments and the ensuing analysis suggests that (a) our current autonomy stack can effectively handle the requirements of this project, e.g., mapping large regions with low-drift, flying close enough to large trees to gather actionable measurements, (b) the platform is stable enough to obtain precise measurements as compared to a hand-held sensor rig. We have developed a hand-held sensor rig (with a LiDAR, RGB cameras, an IMU and a GPS, all connected to an on board computer) to help with mapping. These sensors are the same as those on the autonomous platform. The sensor rig gives us the flexibility to move around the trees (by carrying it over the shoulders in a backpack) and obtain more precise and exhaustive measurements than those allowed by the autonomous platform. The estimates obtained using data from this sensor rig will be compared to those obtained from the flying platform. Active semantic mapping and perception (Thrust A) We developed algorithms to perform semantic mapping, i.e., identity different objects in the environment and categorize them as such (instances of cars, pedestrians and trees primarily). This has been evaluated extensively, e.g., near the Penn campus, at West Point on ~2 km loops with multiple objects (e.g., a row of cars parked along the road) etc. The highlights of this approach are that we can robustly detect objects and categorize them using partially occluded viewpoints, estimate the orientation (we have tested this for cars) precisely using PCA. For active perception, we have builtupon a theory in computational neuroscience called "slow feature analysis" (SFA) which defines objects as sets of features that vary slowly in time. SFA, and its variants, are about maximizing the mutual information between the representation of the input. Our formulation of active perception revolves around the idea of choosing actions that ruin this invariant of SFA, i.e., they minimize the mutual information between the representation of successive images across time while minimizing a control cost. In simulation experiments using 2D scenes, we instantiated this principle to demonstrate trajectories for agents that explore the scene and show a "curious but busy bee" behavior, i.e., the agent explores an object until it learns enough invariant features about it and moves on to the next object after this... Estimation of tree volume (Thrust B/C) We have developed a procedure to estimate the volume of tree trunks in LiDAR data. This involves A procedure to annotate point clouds corresponding to tree trunks quickly from a top-down view. A deep network that segments tree trunks in a depth image that is created from the LiDAR point cloud. This network is trained on data created from the annotation. We have developed a procedure to estimate the diameter of the tree trunk (by filling a circle to the LiDAR points) at various heights. As an initial experiment, trees representative of the Appalachian Mountain region in Virginia were selected for destructive sampling. Prior to felling, trees were scanned with the robotic sensors. Following this, trees were felled and detailed stem profile data were collected. These data include diameters at every meter along the stem, the diameter at breast height (1.37 m or 4.5 ft), diameter at 5.27 m (17.3 ft) to construct Girard form class, bark thickness, and total height of the tree. Species of these tress was noted down along with any abnormalities regarding the stem. These detailed stem profile data are being used as ground-truth to evaluate the effectiveness of robotic height and diameter measurements. These data are also used to calculate the stem volume. Known volumes have been compared with parametric prediction models and non-parametric methods using robotics data. The following table demonstrates this comparison on the initial experiment. DBH (cm) Height (m) Parametric Volume (m^3) Measured Volume (m^3) 15.7 17.4 0.158 0.178 17.9 17.8 0.211 0.236 42.5 21.9 1.233 1.742 30.5 20.3 0.590 0.803 28.2 22.7 0.619 0.747 26.1 21.2 0.497 0.610 31.3 21.2 0.706 0.750 We compared the estimates obtained the robotic platform to those in the above table estimated by the Virginia Tech team. Across 8 trees that were felled in the first stage of the experiment, we found that Median error in cm of estimates (with the table above as ground truth) Hand-carry Flight DBH Profile DBH Profile 0.55 1.1 1.05 1.05 This is remarkable on two counts: It indicates that estimates of these quantities (both the DBH which is a key metric in forestry and the profile of the diameter with respect to the tree height which is used for parametric volume estimation) are within 2% of the estimates obtained by experts. The estimates obtained from the flying platform are roughly similar in terms of accuracy to the ones obtained using the hand-carried sensor rig.

Publications