OpenTreeID: Advancing Community Forestry with Human-Augmented Computer Vision

OPENTREEID: ADVANCING COMMUNITY FORESTRY WITH HUMAN-AUGMENTED COMPUTER VISION

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

SMALL BUSINESS GRANT

Reporting Frequency

Annual

Accession No.

1009614

Grant No.

2016-33610-25462

Cumulative Award Amt.

$100,000.00

Proposal No.

2016-00603

Multistate No.

(N/A)

Project Start Date

Aug 15, 2016

Project End Date

Feb 14, 2017

Grant Year

2016

Program Code

[8.1]- Forests & Related Resources

Recipient Organization
AZAVEA, INC.
340 NORTH 12TH STREET, SUITE 402
PHILADELPHIA,PA 19107

Performing Department
(N/A)

Non Technical Summary
The proposed research will develop new tools for automated tree species identification that will advance the collaboration of government agencies, citizen groups, and nonprofit organizations that is critical to contemporary community forestry. Proper species identification is particularly important during a tree inventory, since many diseases and infestations are species-specific. Yet the large number of species for which routine identification is required can be extremely problematic for citizen volunteers with minimal horticultural training. OpenTreeID will leverage current advances in machine learning, visual recognition, and computer vision techniques to reduce the amount of volunteer time spent sorting through potential species options in tree keys or field guides while improving the overall accuracy rate of crowdsourced tree data for host agencies. The time saved on routine tree identifications can then be invested in other tree management activities to support a healthy community forest.Healthy street trees slow the accumulation of greenhouse gases, intercept stormwater runoff, improve air quality, reduce noise levels and surface temperatures, create wildlife habitat, increase property values, and provide shade and windbreak that reduce business and household energy consumption. Proper management of the community forest through greater understanding of its composition will help maximize these benefits and improve the quality of life for local citizens.

Animal Health Component

(N/A)

Research Effort Categories

Basic

(N/A)

Applied

(N/A)

Developmental

100%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
124	7410	2080	70%
903	7310	3020	10%
123	0530	3100	20%

Knowledge Area
903 - Communication, Education, and Information Delivery; 124 - Urban Forestry; 123 - Management and Sustainability of Forest Resources;

Subject Of Investigation
7310 - Experimental design and statistical methods; 0530 - Parks and urban green space; 7410 - General technology;

Field Of Science
3020 - Education; 2080 - Mathematics and computer sciences; 3100 - Management;

Keywords

species identification

trees

Goals / Objectives
The primary goal of the proposed research is to develop new tools for automated tree species identification that will advance the collaboration of government agencies, citizen groups, and nonprofit organizations that is critical to contemporary community forestry. Proper species identification is particularly important during an inventory, since many diseases and infestations are species-specific. That said, the large number of species for which routine identification is required can be extremely problematic for citizen volunteers with minimal horticultural training.Automated species identification systems can reduce the amount of volunteer time spent manually sorting through potential species options in tree keys or field guides while improving the overall accuracy rate of crowdsourced tree data for host agencies. One example of such a system is Leafsnap, a mobile application that uses computer vision to help identify tree species from photographs of their leaves taken against a solid, light-colored background. Leafsnap is able to distinguish between leaf and non-leaf images submitted for identification, segment the leaf from its background, evaluate the leaf's contour, and rapidly iterate through a database of 185 tree species common to the Northeastern United States before returning a list of up to twenty-five potential matches. Users must then sort through the images and refer to supplementary content before making the final identification.While Leafsnap has made significant contributions to the field of species identification, its limited geographic scope, match accuracy rate, and features, combined with the lack of both a sustainable business model and shared source code, have significantly reduced its potential impact on collaborative community forestry. The proposed research will address these limitations by combining advances in visual recognition and species identification techniques with demonstrated community forestry software experience in a next-generation tree species identification system supported by a long-term business model. In so doing, this project will build on existing investments by USDA in collaborative inventory and planning activities in communities around the country.Azavea's Phase I research will focus on the achievement of four specific project objectives:1. Compile a Training Database An automated species identification system serves as a subject matter expert of sorts by rapidly iterating through a curated database of high-resolution images and attempting to match them to unknown images submitted by users. Creating this curated "training database" is essential before other development work can begin. The images will be used to train the system to recognize species by selectively weighting a series of variables and self-adjusting again and again to maximize identification accuracy.2. Determine a Matching MethodologyOne of the most critical technical challenges for this research will be determining the optimal combination of statistical approach and morphological characteristics needed to achieve a high match-accuracy rate. After evaluating several potential combinations, Azavea will supplement the designated elements of statistics and morphology with a series of visual questions. Under this methodology, the system will dynamically pose questions to the user in an intelligent order based on the content of the submitted image and the initial list of probable matches. The questions will be easy for a human to answer, such as whether the bark is rough or smooth, but comparatively difficult for the computer to resolve. The user's answers are then treated probabilistically, and, when combined with computer vision, incrementally refine the probability distribution to determine the most likely species match.3. Develop Identification AlgorithmsOnce the matching methodology is determined, it will be necessary to develop algorithms that will both automate the identification process and enable it to be refined and enhanced through machine learning techniques as each new image is added to the system. Rather than attempting to write these algorithms entirely from scratch, Azavea will use an open source computer vision library called OpenCV as the springboard for algorithm development. Selected algorithms will need to be adapted and/or rewritten to enable the dynamic species identification and continuous database training process this research will require.4. Design and Build a Local PrototypeCreating a compelling user experience that will support community forestry projects is an important component for project success. Azavea's user-centered design approach places emphasis on an iterative design process that begins with user research, analysis, and prototyping before any software code is written. Working with two community forestry subject matter experts, Azavea will prepare a customized set of wireframe screen designs as the template for the Phase I prototype. The wireframes will demonstrate how users will interact with the system to photograph trees/tree parts, initiate the automated species identification process, and respond to probabilistic questions. Once the user experience has been designed, Azavea developers will build a robust technology architecture to support it. The dynamic use of both distributed computing clusters and parallel machines has shown significant potential in the literature as a computational environment for automated species recognition. Azavea will evaluate the use of the GeoTrellis high performance geoprocessing engine and programming toolkit as the potential development framework for this project.

Project Methods
Using a subset of images from the Leafsnap database for species that are common to community forests, Azavea will build a prototype software application that can improve on existing methods for identifying tree species. A hybrid approach will combine computer vision, image processing, and pattern recognition techniques with "human-in-the-loop" concepts to incrementally refine the probability distribution of potential responses and determine the most likely species match for each image submitted to the system. A similar approach was successfully tested by others on an image dataset of 200 bird species, which makes it particularly promising for community forestry. In common with trees, many bird species are visually identical to other species and could not otherwise be correctly identified by non-experts.In order to develop probabilistic questions specific to trees and to oversee identification accuracy, Azavea will rely on two subject matter experts with a strong background in both community forestry and tree species identification. Once the entire prototype has been designed and built, Azavea and its advisors will perform limited match accuracy testing using sample images collected in the field. Overall performance and user workflow will also be tested with a small group of community foresters.

Progress 08/15/16 to 02/14/17

Outputs
Target Audience:Our target audiences for this project are the municipal governments, nonprofits, and citizen foresters that contribute to the health and maintenance of the community forest. While we did not work directly with any of these audiences during this reporting period, we made significant progress toward developing an application that will meet their needs. Our urban forestry consultants, provided valuable expertise and thoughtfully considered which species to represent in thePhase I prototypebased on the various reasonsour target audiencesmay be using the software. They raised questions we had not previously considered regarding invasive species, pests and diseases, and the process of posing questions to users. The resulting software is much more useful to both urban forestry organizations and individual users because of their contributions to the project. Changes/Problems:Azavea encountered three unexpected challenges thatneeded to beaddressed during the first three months of this project. First, we hadother urban forestry work that temporarily shifted our focus. OpenTreeID may eventually be included as an addition to OpenTreeMap, our software for collaborative urban tree inventory. Over forty cities, non-profit groups, and businesses now subscribe to the OpenTreeMap Software-as-a-Service (SaaS) system, and several of those clients have contracted with us to add custom features to the system. Our success with OpenTreeMap has provided us with both revenue and a wealth of user feedback, but it has required us to periodically focus on client-funded features that had not previously been included in the development schedule. We currently expect that these additional development obligations will be reduced over the next few months. Second, implementing a convolutional neural network is a different approach than we had originally planned. Our research suggests that this approach will improve the likelihood of more accurate matches. This conclusion is based on our research into neural networks as well as a thorough literature review showing demonstrated success with using CNNs on other plant related machine learning projects. This shift in methodology may not result in the successes we expect, however, and, if this turns out to be the case, we will then need to return to our methodology research to test a different approach. Third, while we had originally planned to design the system to use GeoTrellis, Azavea's high performance geoprocessing engine and toolkit, additional research found that Amazon Web Services provides GPU-based instances that will support increasing the speed of the convolutional neural networks significantly compared to traditional Central Processing Units (CPUs). This will provide sufficient processing capabilities to support the project. What opportunities for training and professional development has the project provided?Through its focus on machine learning, this project has provided opportunities for professional development and knowledge sharing across multiple software teams at Azavea. When we originally proposed the OpenTreeID system, we had primarily considered implementing a methodology that improved upon the segmentation algorithm used by an existing system. Upon further research, however, we decided on implementing a convolutional neural network to support recognizing the leaf without a separate segmentation step. That decision was made after extensive research and collaboration with software engineers on multiple teams at Azavea. Roundtable discussions between these engineers and the Urban Ecosystems team that will be primarily responsible for developing OpenTreeID saved us from completing extensive and potentially costly tasks or implementing a methodology that was unlikely to result in more accurate matches. In so doing, this knowledge exchange benefitted both the Urban Ecosystems team and the project as a whole. How have the results been disseminated to communities of interest?The results of this project to date have been shared with members of the Azavea staff as well as our urban forestry consultantsand our technical consultant. We anticipate broader dissemination during the last half of the project, once the prototype is operational. What do you plan to do during the next reporting period to accomplish the goals?In the second half of the grant period, Azavea will test ourspecies identification methodologiesagainst sample data and work with our urban forestry and computer vision consultants to create a user interface for the OpenTreeID web application. Creating this interface will enable us to conduct some basic user testing and gain valuable insight into user needs to inform our commercialization plans that currently focus on non-profit groups and government agencies.

Impacts
What was accomplished under these goals? The first half ofAzavea's Phase I project has been focused on identifying a matching methodology to improve upon existing leaf identification systemsand building a prototype application to test its utility. OpenTreeIDcombines computer vision techniques with human decision-making to more efficiently and accurately identify a tree's species based on the shape of its leaves. Users will open the OpenTreeID web application, hold a leaf in front of the camera on their smartphone, and quickly receive suggestions on the proper identification of the tree's species. By using a mobile phone, which many individuals already carry with them regularly, OpenTreeID eliminates the need for a bulky tree identification booklet and enables people to learn and contribute useful dataabout the urban forest at any time. Azavea is developing OpenTreeID by way of four technical objectives and a series of related tasks. Our first objective was to compile a training database of leaf images that will be used to train the automated species identification system. Working with two urban forestry subject matter experts,weselected twenty-five species for this initial phase of work that are common to our pilot location of the City of Philadelphia. Weresearched possible training image datasets from publicly available projects including Leafsnap, PlantCLEF at the Life CLEF Lab, and the Leaf Classification Project using images from the Royal Botanic Gardens at Kew, United Kingdom. Only the Leafsnap database included images for all the species selected by the urban forestry consultants and featured a variety of images captured in the field, which assists with training the system. We therefore plan to use Leafsnap imagery for our training database. Our second objective focuses on researching and selecting a matching methodology that will combine with questions posed to the user to create a higher match-accuracy rate. After an extensive review of various machine learning methodologies, we chose to implement a convolutional neural network based on previous research showing success with such networks for plant identification. Because It is not possible for computer vision and machine learning to resolve all identifications accurately 100% of the time, we are also creating a database of species characteristics (bark texture, leaf size, etc.) that we will use to ask users questions about the trees. We expect that adding this "human-in-the-loop" element to distinguish between two possible species suggested by the neural network will greatly increase the accuracy of results. Our third objective is focused on developing algorithms that will both automate the identification process and enable it to be refined and enhanced through machine learning techniques. In coordination with our machine learning subject matter expert, Azavea staff reviewed the OpenCV (Open Source Computer Vision Library) computer vision and machine learning software library for potential algorithms that could be adapted to this project. After much discussion, we chose to proceed with implementing a convolutional neural network approach, which is a software program that learns the expected outcomes for a proposed question based on the answers fed into the system through a series of training images or questions. Training a neural network with photos of leaves in multiple orientations, including both plain backgrounds and consistent but noisy backgrounds, like sidewalks, will result in a network that can accurately identify the leaf species without an additional segmentation step to separate the leaf from the background. Should the convolutional neural network return results that are not as accurate as we would like, we may implement a segmentation step later on in order to improve match accuracy. Our fourth and final objective is to design and build a local prototype as a web application that is viewable on mobile and tablet devices.We expect to hold our design charrette in the coming weeks with our urban forestry consultants and the User Experience (UX) Designers at Azavea. The charrette will include developing personas to represent the expected users of the system and developing rough sketches based on their goals for the application.

Publications

Progress 08/15/16 to 02/14/17

Outputs
Target Audience:Azavea has developed a network of partner organizations who advise on the development of our community forestry products, ensuring that they meet the combined needs of our designated target audiences. Through consultation with our partners, combined with extensive e-mail and phone outreach undertaken during the course of the grant timeframe, Azavea has both confirmed our original target audiences of municipal governments and citizen science organizations for OpenTreeID and identified two additional potential audiences. First, middle schools, high schools, colleges, and universities are potentially interested in making tree species identification a regular part of their curriculum. There are also a growing number of citizen science nonprofits that support environmental learning activities outside the traditional classroom. OpenTreeID offers an interactive e-learning experience that will meet a broad range of educational needs. Second, based on our success in applying computer vision to tree leaves for species identification, we also explored using machine learning and computer vision for classifying land use and tree canopy from aerial imagery. Many municipal governments, regional planning commissions, and sustainability groups use canopy coverage as a method for understanding the urban forest and setting tree planting goals, so adding these capabilities to OpenTreeID offers additional value to this audience. Our initial research found that machine learning techniques applied to aerial imagery efficiently extrapolate land use classifications, canopy coverage percentages, and potentially species distributions. The limited timeframe for our Phase I project, combined with leaf-off conditions when the prototype was completed, meant that hands-on testing was necessarily limited. Two urban forestry subject matter experts represented the needs of our target audiences. Their feedback and testing of the completed application was informative and resulted in a better overall user experience. The resulting prototype is more effective for both urban forestry organizations and individual users because of their contributions. Further, our extension of the effort to aerial imagery enabled us to work with historic imagery and perform comparison tests for a broader range of time periods. Changes/Problems:Much of Azavea's Phase I effort focused on researching and selecting a matching methodology that would combine with questions posed to the user to create a higher match-accuracy rate than existing tree species identification systems. In so doing, we identified and addressed three unexpected challenges to our original approach and forged a new path for OpenTreeID that will significantly expand its commercialization potential. Challenge 1: Switching Matching Methodologies One of the most critical technical challenges for this research was determining the optimal combination of statistical approach and morphological characteristics needed to achieve a high match-accuracy rate. We reviewed two main approaches with several specific variations within each approach. The first was to improve the segmentation algorithm, which enables the software to recognize what is a leaf and what is the background image. The second was to implement a convolutional neural network, which learns the expected outcomes for a proposed question based on the answers fed into the system through a series of training images. While we were originally planning to improve the segmentation algorithm, our research suggested that implementing a convolutional neural network would increase the likelihood of more accurate matches, and our testing confirmed extremely accurate results. That said, the decision to switch from a segmentation process to a convolutional neural network did cause a slight delay in beginning the initial development of the project and required us to gain more experience in implementing neural networks. However, that experience is beneficial to both expanding OpenTreeID in a future phase of the project and applying machine learning to other projects at Azavea. Challenge 2: Developing Human-in-the-Loop Questions Creating the list of questions to pose to users was more challenging than we expected. Many of the questions we considered did not have clear answers that non-arborists could distinguish by looking at the tree, or depended on tree features, such as flowers and fruit, which are not visible year-round. While the four questions we implemented resulted in more accurate results, some questions regarding the shape of the leaf or the description of the leaflets reflect characteristics that the neural network should identify on its own. Ideally, the probabilistic questions would relate to information such as the shape of the tree, bark color, fruit, or other species characteristics not connected to the leaf, because the neural network has already identified those features. Challenge 3: Testing and Expansion ?Azavea completed the OpenTreeID prototype application in January 2017, which made it very challenging to conduct any field testing in Philadelphia due to leaf-off conditions. While we were able to find a few leaves on trees near our homes and office to use for testing, we were not able to extensively use the application out in the field. For that reason, we completed the majority of testing using leaf images found online. The success metric we articulated in our Phase I proposal was to exceed the 68% first-place match-accuracy rate achieved with Leafsnap. While we expected the convolutional neural network to result in accurate species identification, our match-accuracy rates averaged 96% and generally exceeded 80%, while reaching 99% in many case. This success exceeded our most ambitious expectations. Initially, we were concerned that the high level of accuracy indicated an error in our programming and the tests did not reflect accurate results. After reviewing our work in training the neural network and the image databases, we confirmed the accuracy of the test results. Even more exciting from a commercialization perspective was the realization that we could apply similar machine learning techniques to other underserved needs in community forestry, including extrapolating tree canopy assessment, tree genus and species distribution, and land use classification data from publicly available satellite imagery. We are only a few years away from being able to capture remotely sensed imagery for any location on the planet on a daily or even more frequent basis. As the data volume goes up and the cost of acquiring data is driven down, the chief value is in making the imagery useful by either processing it for visualization or transforming it into new information products. Extending OpenTreeID to support land use classification from remotely sensed data provides a substantial head start on meeting these challenges and enables global application to a growing array of industries that extend far beyond community forestry. What opportunities for training and professional development has the project provided?The Urban Ecosystems team completed primary development on OpenTreeID, but there were three software engineers from other teams that had considerably more experience with convolutional neural networks (CNN) and machine learning. Azavea was able to coordinate multiple project deadlines so that these engineers could contribute to research and decision-making tasks on OpenTreeID. This saved us from completing extensive and potentially costly tasks implementing an alternate species identification methodology that was unlikely to result in more accurate matches. At the same time, it provided valuable cross-team collaboration and knowledge-sharing that will benefit other projects going forward. Finally, it enabled the software developers and project manager on the Urban Ecosystems team to gain valuable experience and new skills with respect to machine learning and computer vision. How have the results been disseminated to communities of interest?Azavea reviewed the prototype application with our technical and urban forestry consultants on our advisory board and made changes based on their feedback. While the prototype will not be released to the general public, we have demonstrated the application to several colleagues working in the urban forestry field and received positive reviews. Our research into using machine learning on aerial imagery also demonstrated successful results. We submitted these results to a global competition organized by the International Society for Photogrammetry and Remote Sensing (ISPRS) and are currently generating the second most accurate results (with a group from the Chinese Academy of Sciences generating the most accurate results at 91.1% accurate). A blog post has been developed regarding this portion of the project (https://www.azavea.com/blog/2017/05/30/deep-learning-on-aerial-imagery/). The software source code is currently hosted on GitHub and will be featured in a forthcoming edition of Azavea's company newsletter. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Azavea's goal for OpenTreeID is to significantly lower the barriers to entry for small to mid-sized communities that cannot currently afford state-of-the-art data collection and analytical tools for tree canopy assessment, land use classification, and species identification. The consulting costs to gather and interpret this type of data currently range from tens of thousands to many hundreds of thousands of dollars for a single community. The ability to perform such services in-house can be similarly constrained by the need for advanced analytical skills, specialized software, sophisticated classification schemes, and significant investments in staff time. OpenTreeID offers automated tree species identification, tree canopy assessment, and land use classification capabilities in an easy-to-use, subscription-based solution specifically designed for organizations with limited technical or financial resources. To our knowledge, no other community forestry solution currently provides this unique combination of capabilities. The original intent of our Phase I project was to improve upon existing tree identification systems, such as Leafsnap, by combining computer vision and machine learning techniques with human decision-making. This, in turn, would increase the overall accuracy rate of crowdsourced tree data that is critical to contemporary community forestry. To that end, we identified and addressed four key technology objectives: 1.) Compile a Training Database Working with two community forestry advisors, we selected twenty-five species commonly found on urban streets and parks in the Northeast United States. We also created a database of images from the existing publicly available databases to train the automated species identification system. We later expanded this species list and the corresponding images based on successful testing of our twenty-five initial species. 2.) Determine a Matching Methodology Azavea chose a matching methodology combined with questions posed to the user to create a higher match-accuracy rate. After assessing two potential approaches, we implemented a convolutional neural network, an image classification algorithm that learns the expected outcomes for a proposed problem (such as selecting the correct tree species) based on comparing an input image with a curated set of training images. We also developed a series of leaf-related questions to be posed to the user in order to narrow down the potential results identified by the software. 3. Develop Identification Algorithms Azavea consulted with a machine learning advisor to select and update a convolutional neural network model that automates the species identification process. We then trained the network with photos of leaves, including images with plain backgrounds and images with noisy backgrounds such as sidewalks and other leaves, to enable it to distinguish one species from another. 4. Design and Build a Local Prototype The resulting prototype uses Keras, a high-level neural networks library; TensorFlow, an open source machine learning library originally from Google; and Amazon Web Services, a cloud computation infrastructure. It is accessible as a web application optimized for the Chrome browser on mobile devices. In initial testing, our prototype has achieved accuracy results significantly exceeded our metrics for success. Users open the OpenTreeID web application, hold a leaf in front of the camera on their smartphone, and quickly receive suggestions on the proper identification of the tree's species. OpenTreeID's first-place match-accuracy rate is between 88 and 95%, which is a significant improvement over Leafsnap's 68% match-accuracy rate with the same species. Based on successful completion of the four technical objectives outlined above and with time remaining in the grant period, we expanded our convolutional neural network implementation to explore how the same class of machine learning techniques could be applied to analyzing aerial and satellite imagery in order to distinguish between different types of land cover in aerial imagery on a pixel-by-pixel basis. These are important components of community forestry plans and canopy coverage goals that serve as a benchmark for future tree planting and maintenance goals. Using semantic segmentation, a machine learning process that makes a classification determination for each pixel in an aerial image, Azavea has been able to automate the application of land use classes that include trees, impervious surfaces, buildings, low vegetation, cars, and clutter/background. Further, we have identified multiple potential methods for extrapolating tree genus and species distribution from aerial imagery. Use cases in agriculture, international development, urban and regional planning, environmental management, public utilities, and sustainable transportation will range from vegetation analyses to watershed management and from targeting locations for green infrastructure to studying forest fragmentation over time.

Publications

Type: Other Status: Published Year Published: 2017 Citation: https://www.azavea.com/blog/2017/05/30/deep-learning-on-aerial-imagery/