A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recog...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2013-10-01
|
Series: | Sensors |
Subjects: | |
Online Access: | http://www.mdpi.com/1424-8220/13/11/14398 |
id |
doaj-c2bc6df3145745da95f675d38242c3bc |
---|---|
record_format |
Article |
spelling |
doaj-c2bc6df3145745da95f675d38242c3bc2020-11-24T21:01:37ZengMDPI AGSensors1424-82202013-10-011311143981441610.3390/s131114398A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic VideosWenjun ZhangRui ZhangWeijia ZouQuan ZhouJun ZhuHuman action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK) for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM) kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51) validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.http://www.mdpi.com/1424-8220/13/11/14398video analysishuman action recognitionpyramid matching kernelkernel-based classification method |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wenjun Zhang Rui Zhang Weijia Zou Quan Zhou Jun Zhu |
spellingShingle |
Wenjun Zhang Rui Zhang Weijia Zou Quan Zhou Jun Zhu A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos Sensors video analysis human action recognition pyramid matching kernel kernel-based classification method |
author_facet |
Wenjun Zhang Rui Zhang Weijia Zou Quan Zhou Jun Zhu |
author_sort |
Wenjun Zhang |
title |
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos |
title_short |
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos |
title_full |
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos |
title_fullStr |
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos |
title_full_unstemmed |
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos |
title_sort |
generalized pyramid matching kernel for human action recognition in realistic videos |
publisher |
MDPI AG |
series |
Sensors |
issn |
1424-8220 |
publishDate |
2013-10-01 |
description |
Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK) for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM) kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51) validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature. |
topic |
video analysis human action recognition pyramid matching kernel kernel-based classification method |
url |
http://www.mdpi.com/1424-8220/13/11/14398 |
work_keys_str_mv |
AT wenjunzhang ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT ruizhang ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT weijiazou ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT quanzhou ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT junzhu ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT wenjunzhang generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT ruizhang generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT weijiazou generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT quanzhou generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos AT junzhu generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos |
_version_ |
1716777469807165440 |