Using Deep Learning Real-Time Embedded System for Pedestrian Detection and Tracking

碩士 === 國立中央大學 === 資訊工程學系 === 106 === A vision-based person following method is important for various applications in human robot interaction (HRI). The accuracy and speed of person detector decide the performance of reliable person following system. However, state-of-the-art object detection based o...

Full description

Bibliographic Details
Main Authors: Chih-Yu Wang, 王志宇
Other Authors: Kuo-Chin Fan
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/v857sp
Description
Summary:碩士 === 國立中央大學 === 資訊工程學系 === 106 === A vision-based person following method is important for various applications in human robot interaction (HRI). The accuracy and speed of person detector decide the performance of reliable person following system. However, state-of-the-art object detection based on CNNs such as YOLO require large memory and computational resources provided by high-end GPUs for real-time applications. They are unable to run on an embedded device with low-level CPUs or FPGA. Therefore, in this paper, a lightweight but reliable human detector Brisk-YOLO which developed by optimizing the model architecture and using training techniques. This method can reduce the computed quantity greatly and guarantees the accuracy of person detection. In addition, in order to reduce the computation cost, the detector applies to every frame. It only applies in the beginning for initializing human target localization, alleviating the accumulated tracking error and on the events of object missing or occlusion. We have selected fast Object Tracking and Person Re-identification methods to ensure that system can run steadily. The experimental results indicate that this system achieves real-time and reliable operation on the Raspberry Pi 3 with only 1.2GHz ARM CPU and 1GB of RAM in real-world person following scenario videos, and its accuracy is better than other long-term tracking methods. The proposed system can re-identify persons after periods of occlusion and distinguish a target from each other, even if they are looking similar. The BoBoT benchmark resulted in an average IoU of 73.39%, which is higher than state-of-the-art algorithms.