Efficient Multiple Path Search for Action Tube Detection in Videos

碩士 === 國立臺灣科技大學 === 電子工程系 === 105 === This thesis presents an efficient convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. First, a new fusion strategy is employed, which combines the appearance and the flow information out of the two-strea...

Full description

Bibliographic Details
Main Author: Erick Hendra Putra Alwando
Other Authors: Wen-Hsien Fang
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/08603454653555086253
Description
Summary:碩士 === 國立臺灣科技大學 === 電子工程系 === 105 === This thesis presents an efficient convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. First, a new fusion strategy is employed, which combines the appearance and the flow information out of the two-stream CNN-based networks along with motion saliency to generate the action detection scores. Thereafter, an efficient multiple path search (MPS) algorithm, is developed to simultaneously find multiple paths in a single run. In the forward message passing of MPS, each node stores information of a prescribed number of paths based on the accumulated scores determined in the previous stages. A backward path tracing is invoked afterward to find all multiple paths at the same time by fully reusing the information generated in the forward pass without repeating the search process. Thereby, the complexity incurred can be reduced. Moreover, to rectify the potentially inaccurate bounding boxes, a video localization refinement (VLR) scheme is also addressed to further boost the detection accuracy. Simulations show that the proposed MPS provides superior performance compared with the main state-of-the-art works on the widespread UCF-101 and J-HMDB datasets. Together with VLR, the performance of MPS can be further bolstered.