Human Detection and Tracking for Agricultural Workforce Safety

HUMAN DETECTION AND TRACKING FOR AGRICULTURAL WORKFORCE SAFETY

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

TERMINATED

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1003788

Grant No.

2014-67021-22171

Project No.

PENW-2014-07599

Proposal No.

2014-07599

Multistate No.

(N/A)

Program Code

A7301

Project Start Date

Aug 15, 2014

Project End Date

Aug 14, 2017

Grant Year

2014

Project Director
Herman, H.

Recipient Organization
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh,PA 15213-3815

Performing Department
(N/A)

Non Technical Summary
Robotics offers a great opportunity within agriculture to improve efficiency while also improving safety in one of the most dangerous occupations. Enabling the full promise of robotics in agriculture requires reliable detection and tracking of human co-workers so that people and machines can effectively and safely perform required tasks in these domains. Many agricultural machines are powerful and potentially dangerous, and agricultural fields generally have minimal access controls. Certain tasks require humans to work closely to these machines, and other applications may need to enforce a safety buffer around the equipment. Even smaller agricultural robots often need to understand where the people in their environment are to effectively complete their tasks. There has been great interest recently in pedestrian detection for automotive applications, and we believe a similar effort is required in agricultural applications to enable the safe deployment of many promising robotics technologies in agriculture.The aim of this work is to develop a general approach for the detection and tracking of people in agricultural environments, which we believe is a requirement for the safe and effective deployment of robotic agricultural equipment in many applications.This work will move beyond the traditional urban or indoor setting where the focus is on geometrically distinct walking or standing pedestrians to complex agricultural environments that include trees, weeds, crops, and other vegetation that often partially obscure people who are working in or near these natural materials. In this domain it is common for people to be bent over, squatting down, up on ladders, or in other uncommon postures. Due to the potential safety hazard, this work will also focus on the detection of people who have fallen onto the ground. Agricultural domains have significant challenges but they also offer unique opportunities, since they often consist of repetitive, predictable, natural textures, and operations are generally performed many times in the same area, enabling the system to be trained to a given environment. Within this context we will incorporate anomaly detection relative to a learned background model along with motion estimation as additional cues. Classification will be performed over tracked regions to incorporate a temporal aspect for improved detection. A data-driven deep learning approach will be taken for adapting to specific environments, and training techniques will be developed to automate the process of collecting data for a new environment. We will report results for several different agricultural environments to verify the efficacy of this approach, and we will make our labeled dataset available as a new benchmark to help spur growth in this area within the research community.In addition to enabling agricultural applications, we believe the technology developed in this program will have a strong impact on mining and construction applications, and components will feed back to urban pedestrian detection as well. By extending human detection and tracking into a new set of challenging domains, we will create an enabling technology for a number of important applications.

Animal Health Component

Research Effort Categories

Basic

40%

Applied

40%

Developmental

20%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
402	5310	2080	40%
404	5310	2080	40%
723	5310	2080	20%

Knowledge Area
404 - Instrumentation and Control Systems; 402 - Engineering Systems and Equipment; 723 - Hazards to Human Health and Safety;

Subject Of Investigation
5310 - Machinery and equipment;

Field Of Science
2080 - Mathematics and computer sciences;

Keywords

perception

autonomy

worker safety

Goals / Objectives
The goal of this work is to develop a general approach for the detection and tracking of people in agricultural environments, which we believe is a requirement for the safe and effective deployment of robotic agricultural equipment in many applications.Objectives:1. Develop an approach for human detection and tracking in agricultural environments from a moving vehicle, specifically including the following capabilities: 1A. Cases where people are partially occluded by vegetation 1B. Cases where people are in non-standing configurations such as squatting, bent over, or on the ground 1C. Cases with moving and falling people2. Make the approach practical for a wide range of agricultural systems: 2A. Computationally able to run in real-time on a robotic platform 2B. Use relatively low cost sensors such as stereo cameras 2C. Not require precise GPS3. Create tools that allow the system to be adapted to a new application environment with a minimum of manual hand-tuning4. Enable future research in this area by creating a public benchmark labeled dataset in this domain

Project Methods
Significant relevant sensor data from multiple application environments will be collected and ground-truth labeled. This data will make up a benchmark dataset for this application and will be made public and promoted. Along with the data set, we will include open-source software for applying a person detection algorithm to that data set and generating performance metrics on the benchmark. In addition to promoting this benchmark and performance evaluation tool for others to use, we will use it internally to show improvements in performance of the algorithmic work performed on this project relative to prior work. The benchmark dataset will be partitioned across the different objectives to be able to show performance improvements for each objective, as well as which components of the developed approach provide the most significant gains.The first year milestone is focused on producing a fully working system with all components in place to show improved results for standing people in vegetation (objective 1A and objective 2). The second year milestone is focused on the problem of easily adapting the system to work in different environments (objective 3) along with key performance improvements including the detection of people in non-standing poses and in motion (objectives 1B and 1C). The public benchmark dataset will be prepared and promoted during the second year as well (objective 4).Efforts to deliver this information and knowledge to the target audience include publishing technical papers, hosting a conference workshop for other researchers in this field, including graduate students in the work, adding the material to a graduate course lecture, and adding the material to a recurring university workshop for industry professionals and decision makers for robotic applications in agriculture, mining, energy, and defense.

Progress 08/15/14 to 08/14/17

Outputs
Target Audience:Robotics community Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have supported a graduate student in this program How have the results been disseminated to communities of interest?See publications What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Major goals of the project The goal of this work is to develop a general approach for the detection and tracking of people in agricultural environments, which we believe is a requirement for the safe and effective deployment of robotic agricultural equipment in many applications. Objectives: 1.Develop an approach for human detection and tracking in agricultural environments from a moving vehicle, specifically including the following capabilities: 1A. Cases where people are partially occluded by vegetation The system that we developed can detect partially occluded people. In addition, we have tested various approaches and benchmarked their performance on the reference data set that we collected. 1B. Cases where people are in non-standing configurations such as squatting, bent over, or on the ground We have more limited data set for people in these configuration which result in less training data for the learning algorithm, resulting in preliminary positive result. 1C. Cases with moving and falling people The system is shown to work pretty well in detecting moving people, and the reference data set has many example of labeled data with moving people. 2.Make the approach practical for a wide range of agricultural systems: 2A. Computationally able to run in real-time on a robotic platform The system can be run on a computer with fast GPU. Currently it requires a large GPU, but Nvidia and other companies are already planning to release embedded product with similar performance to today's large GPU. These products will enable these algorithms to be run real-time on robotics platform. 2B. Use relatively low cost sensors such as stereo cameras We are already using low cost and low resolution stereo cameras for all our work 2C. Not require precise GPS The system doesn't rely on precise GPS at all. In fact it doesn't need GPS. 3.Create tools that allow the system to be adapted to a new application environment with a minimum of manual hand-tuning This is one objective that we didn't get as far as we want to is the development of these tools. Despite that, we have tried applying the system to other environments with pretty good success and the process only entails minimal amount of tuning. 4.Enable future research in this area by creating a public benchmark labeled dataset in this domain Yes, the labeled dataset has been made publicly available on our website.

Publications

Type: Journal Articles Status: Published Year Published: 2017 Citation: Pezzementi Z, Tabor T, Hu P, et al. Comparing apples and oranges: Off?road pedestrian detection on the National Robotics Engineering Center agricultural person?detection dataset. J Field Robotics. 2018;35:545563.https://doi.org/10.1002/rob.21760

Progress 08/15/14 to 08/14/15

Outputs
Target Audience:As per the work plan in our proposal, the first year's work was focused on data collection, system development, and initial experimentation, with a greater focus on publication and dissemination to the research community in the second year. The main audience reached so far is internal to Carnegie Mellon University, but it will broaden significantly upon acceptance of our pending conference submission. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? The project provided partial support for two engineers to attend the Conference on Computer Vision and Pattern Recognition (CVPR 2015) to stay up-to-date with the latest advances in the field. A PhD student (Benzun Babu) has been hired for a research internship for Sep 2015-Jan 2016. How have the results been disseminated to communities of interest? Progress on the project to date was presented in a seminar to a Carnegie Mellon computer vision group on March 20, 2015. A Carnegie Mellon seminar and tutorial presenting the CNN library developed for this project was held on May 15, 2015. The work on appearance and context is in submission to the 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics. What do you plan to do during the next reporting period to accomplish the goals?We plan to conduct an additional data collection trip to record logs of a new environment to support work on generalizing the approach to different domains. The next targeted environment is a Pennsylvania apple orchard, which has a different appearance from the citrus orchard, but similar structure, allowing for the same types of logs to be collected. We also plan to develop a set of automated data processing tools for running on the existing orange grove data and extending to the apple orchard data. This will include a pipeline for general classifier training (supporting ACF and CNN) using an extensible set of generic feature channels. Further work is planned on our CNN library as well, to improve the training and testing pipeline for more rapid iteration and extend it to support end-to-end training of the full detection process. This training pipeline will then be used to incorporate the features for geometry, motion, and anomaly detection that were developed this year asdescribed above. Existing features that have shown success in the pedestrian detection literature will also be compared against. Initial experiments will use ACF to explore the space of possible features and then evaluate promising combinations using CNNs. The results of these experiments will be analyzed and published. Next, more temporal information will be added to allow classification over tracked sequences rather than individual frames. This will allow temporal information to be incorporated into the classification process, rather than as a post-filtering step. We have several activities planned for distributing our data set to the community: We will design a final format for the data to support maximal reuse and interoperability with other related data sets. We will organize a workshop to introduce the benchmark to the community. We will build a website to make the data set available and provide tools for interacting with it. This same website will also be able to be used by other Carnegie Mellongroups to publish datasets.

Impacts
What was accomplished under these goals? The aim of this work is to develop a general approach for the detection and tracking of people in agricultural environments, which we believe is a requirement for the safe and effective deployment of robotic agricultural equipment in many applications. This work moves beyond the traditional urban or indoor setting where the focus is on geometrically distinct walking or standing pedestrians to complex agricultural environments that often include people who are bent over or in other non-standing postures and alsoinclude trees, weeds, crops, and other vegetation that often partially obscure people who are working in or near these natural materials. By collecting data in this domain and applying state-of-the art methods, we have moved one step closer to producing person detectors that are able to deal with all these challenges to support safe and robust agricultural robotic systems. Objective 1) Develop an approach for human detection and tracking in agricultural environments from a moving vehicle, specifically including the following capabilities: a) Cases where people are partially occluded by vegetation b) Cases where people are in non-standing configurations such as squatting, bent over, or on the ground c) Cases with moving and falling people We collected data (discussed further in Objective 4) covering cases of people occluded by vegetation, people in non-standing configurations, moving people, and falling people. As outlined in our work plan, our efforts in support of objective 1 are in the areas of Classification with Context, Appearance, Geometry, Anomaly Detection, Motion Estimation, and Tracking: In the areas of Classification with Context and Appearance modeling, we chose three approaches that have had success in the pedestrian detection literature, all of which model appearance in the context of the entire person, to apply to and test on the data set described above. These methods are the influential Deformable Parts Models (DPM), a random forest-based method called Aggregated Channel Features (ACF), and convolutional neural networks (CNN). We evaluated these methods using standard metrics from the pedestrian detection literature and explored some modifications of these metrics that more appropriately model how a robotic system would perform using these methods in agricultural domains. The evaluation showed similar performance for the three approaches, so we selected ACF and CNN as the focus of future work, since they can readily be extended with additional features. A publication of the results of this work is currently in submission. In the other areas, we developed a set of image features suitable for incorporation into the chosen classification architecture. For Geometry information, we incorporated a Markov-random-field-based ground surface estimator for computing height above the ground, as well as a stereo depth feature. For Anomaly Detection, we built a Gaussian Mixture Model on local (CIE-LAB) color and texture (gradient energy) in background/non-person regions to produce a feature that detects areas of unusual appearance. For Motion Estimation and Tracking, we developed two novel features based on visual odometry, using the residual motion of tracked points after normalizing for estimated ego-motion: The first feature compares the intensity and disparity values of each pixel in the current image with those of the projection of the pixels from the previous image using the current ego-motion estimate. The second feature isolates motion vectors that were detected as outliers from the ego-motion estimation as locations of potential independent motion. A video showing current results of this work is available at https://youtu.be/xAYYr3xBBSY. Objective 2) Make the approach practical for a wide range of agricultural systems: a) Computationally able to run in real-time on a robotic platforms b) Use relatively low cost sensors such as stereo cameras c) Not require precise GPS We have restricted our work to rely only on stereo camera images as inputs, using the camera datato estimate the vehicle's motion as well as perform classification. We collected data with precision GPS position as a reference for ground truth positioning only. Run-time was considered in the design, tuning, and evaluation of the methods discussed above, with a view towards practical embedded use. Objective 3) Create tools that allow the system to be adapted to a new application environment with a minimum of manual hand-tuning We focused our development and evaluation (discussed in support of Objective 1) on data-driven methods, where most specialization to the environment is accomplished through machine learning. Application to a new environment therefore requires new training data for that environment, but a minimum of engineer time. This objective will be a focus of next year's work. Objective 4) Enable future research in this area by creating a public benchmark labeled dataset in this domain We performed a data collection trip to gather logs from our robotic test platform in a citrus orchard. Sets of logs were collected to show people carrying out a standardized set of motions at different distances from the vehicle, while wearing a variety of different types of clothing that are common to the domain. A total of 196 logs where people were present were collected. We selected 74 of these that have been labeled with a bounding box around the position of the person in each frame, and these logs were combined with our existing labeled data from previous work and divided into train, test, and validation sets that were used for initial experiments.

Publications

Type: Conference Papers and Presentations Status: Under Review Year Published: 2015 Citation: Trenton Tabor, Zach Pezzementi, Carlos Vallespi, and Carl Wellington, People in the Weeds: Pedestrian Detection Goes Off-road, IEEE International Symposium on Safety, Security, and Rescue Robotics, 2015.