Feature extraction and representation for human action recognition

Human action recognition, as one of the most important topics in computer vision, has been extensively researched during the last decades; however, it is still regarded as a challenging task especially in realistic scenarios. The difficulties mainly result from the huge intra-class variation, backgr...

Full description

Bibliographic Details
Main Author: Zhen, Xiantong
Other Authors: Shao, Ling
Published: University of Sheffield 2013
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.589361
id ndltd-bl.uk-oai-ethos.bl.uk-589361
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5893612017-10-04T03:27:19ZFeature extraction and representation for human action recognitionZhen, XiantongShao, Ling2013Human action recognition, as one of the most important topics in computer vision, has been extensively researched during the last decades; however, it is still regarded as a challenging task especially in realistic scenarios. The difficulties mainly result from the huge intra-class variation, background clutter, occlusions, illumination changes and noise. In this thesis, we aim to enhance human action recognition by feature extraction and representation using both holistic and local methods. Specifically, we have first proposed three approaches for the holistic representation of actions. In the first approach, we explicitly extract the motion and structure features from video sequences by converting the video representation into a 2D image representation problem; In the second and third approaches, we treat the video sequences as 3D volumes and propose to use spatio-temporal pyramid structures to extract multi-scale global features. Gabor filters and steerable filters are extended to the video domain for holistic representations, which have been demonstrated to be successful for action recognition. With regards to local representations, we have firstly done a comprehensive evaluation on the local methods including the bag-of-words (BoW) model, sparse coding, match kernels and classifiers based on image-to-class (I2C) distances. Motivated by the findings from the evaluation, we have proposed two distinctive algorithms for discriminative dimensionality reduction of local spatio-temporal descriptors. The first algorithm is based on the image-to-class distances, while the second explores the local Gaussians. We have evaluated the proposed methods by conducting extensive experiments on widely-used human action datasets including the KTH, the IXMAS, the UCF Sports, the UCF YouTube and the HMDB51 datasets. Experimental results show the effectiveness of our methods for action recognition.006.37University of Sheffieldhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.589361http://etheses.whiterose.ac.uk/5141/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.37
spellingShingle 006.37
Zhen, Xiantong
Feature extraction and representation for human action recognition
description Human action recognition, as one of the most important topics in computer vision, has been extensively researched during the last decades; however, it is still regarded as a challenging task especially in realistic scenarios. The difficulties mainly result from the huge intra-class variation, background clutter, occlusions, illumination changes and noise. In this thesis, we aim to enhance human action recognition by feature extraction and representation using both holistic and local methods. Specifically, we have first proposed three approaches for the holistic representation of actions. In the first approach, we explicitly extract the motion and structure features from video sequences by converting the video representation into a 2D image representation problem; In the second and third approaches, we treat the video sequences as 3D volumes and propose to use spatio-temporal pyramid structures to extract multi-scale global features. Gabor filters and steerable filters are extended to the video domain for holistic representations, which have been demonstrated to be successful for action recognition. With regards to local representations, we have firstly done a comprehensive evaluation on the local methods including the bag-of-words (BoW) model, sparse coding, match kernels and classifiers based on image-to-class (I2C) distances. Motivated by the findings from the evaluation, we have proposed two distinctive algorithms for discriminative dimensionality reduction of local spatio-temporal descriptors. The first algorithm is based on the image-to-class distances, while the second explores the local Gaussians. We have evaluated the proposed methods by conducting extensive experiments on widely-used human action datasets including the KTH, the IXMAS, the UCF Sports, the UCF YouTube and the HMDB51 datasets. Experimental results show the effectiveness of our methods for action recognition.
author2 Shao, Ling
author_facet Shao, Ling
Zhen, Xiantong
author Zhen, Xiantong
author_sort Zhen, Xiantong
title Feature extraction and representation for human action recognition
title_short Feature extraction and representation for human action recognition
title_full Feature extraction and representation for human action recognition
title_fullStr Feature extraction and representation for human action recognition
title_full_unstemmed Feature extraction and representation for human action recognition
title_sort feature extraction and representation for human action recognition
publisher University of Sheffield
publishDate 2013
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.589361
work_keys_str_mv AT zhenxiantong featureextractionandrepresentationforhumanactionrecognition
_version_ 1718544291669737472