The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet

Indiana University-Purdue University Indianapolis (IUPUI) === Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses...

Full description

Bibliographic Details
Main Author: Raptis, Konstantinos
Other Authors: Tsechpenakis, Gavriil
Language:en_US
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/1805/11827
https://doi.org/10.7912/C2CW7G
id ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-11827
record_format oai_dc
spelling ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-118272019-05-10T15:21:46Z The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet Raptis, Konstantinos Tsechpenakis, Gavriil Action Recognition Dense Trajectories R-CNN LSTM RNN Convolution Neural Networks Recurrent Neural Networks Indiana University-Purdue University Indianapolis (IUPUI) Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency. 2017-01-18T21:09:19Z 2017-01-18T21:09:19Z 2016-11-28 Thesis http://hdl.handle.net/1805/11827 https://doi.org/10.7912/C2CW7G en_US Attribution 3.0 United States http://creativecommons.org/licenses/by/3.0/us/
collection NDLTD
language en_US
sources NDLTD
topic Action Recognition
Dense Trajectories
R-CNN
LSTM RNN
Convolution Neural Networks
Recurrent Neural Networks
spellingShingle Action Recognition
Dense Trajectories
R-CNN
LSTM RNN
Convolution Neural Networks
Recurrent Neural Networks
Raptis, Konstantinos
The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
description Indiana University-Purdue University Indianapolis (IUPUI) === Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency.
author2 Tsechpenakis, Gavriil
author_facet Tsechpenakis, Gavriil
Raptis, Konstantinos
author Raptis, Konstantinos
author_sort Raptis, Konstantinos
title The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
title_short The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
title_full The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
title_fullStr The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
title_full_unstemmed The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet
title_sort clash between two worlds in human action recognition: supervised feature training vs recurrent convnet
publishDate 2017
url http://hdl.handle.net/1805/11827
https://doi.org/10.7912/C2CW7G
work_keys_str_mv AT raptiskonstantinos theclashbetweentwoworldsinhumanactionrecognitionsupervisedfeaturetrainingvsrecurrentconvnet
AT raptiskonstantinos clashbetweentwoworldsinhumanactionrecognitionsupervisedfeaturetrainingvsrecurrentconvnet
_version_ 1719080078952890368