A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos

Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recog...

Full description

Bibliographic Details
Main Authors: Wenjun Zhang, Rui Zhang, Weijia Zou, Quan Zhou, Jun Zhu
Format: Article
Language:English
Published: MDPI AG 2013-10-01
Series:Sensors
Subjects:
Online Access:http://www.mdpi.com/1424-8220/13/11/14398
id doaj-c2bc6df3145745da95f675d38242c3bc
record_format Article
spelling doaj-c2bc6df3145745da95f675d38242c3bc2020-11-24T21:01:37ZengMDPI AGSensors1424-82202013-10-011311143981441610.3390/s131114398A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic VideosWenjun ZhangRui ZhangWeijia ZouQuan ZhouJun ZhuHuman action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK) for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM) kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51) validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.http://www.mdpi.com/1424-8220/13/11/14398video analysishuman action recognitionpyramid matching kernelkernel-based classification method
collection DOAJ
language English
format Article
sources DOAJ
author Wenjun Zhang
Rui Zhang
Weijia Zou
Quan Zhou
Jun Zhu
spellingShingle Wenjun Zhang
Rui Zhang
Weijia Zou
Quan Zhou
Jun Zhu
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
Sensors
video analysis
human action recognition
pyramid matching kernel
kernel-based classification method
author_facet Wenjun Zhang
Rui Zhang
Weijia Zou
Quan Zhou
Jun Zhu
author_sort Wenjun Zhang
title A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
title_short A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
title_full A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
title_fullStr A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
title_full_unstemmed A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos
title_sort generalized pyramid matching kernel for human action recognition in realistic videos
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2013-10-01
description Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK) for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM) kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51) validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.
topic video analysis
human action recognition
pyramid matching kernel
kernel-based classification method
url http://www.mdpi.com/1424-8220/13/11/14398
work_keys_str_mv AT wenjunzhang ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT ruizhang ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT weijiazou ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT quanzhou ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT junzhu ageneralizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT wenjunzhang generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT ruizhang generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT weijiazou generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT quanzhou generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
AT junzhu generalizedpyramidmatchingkernelforhumanactionrecognitioninrealisticvideos
_version_ 1716777469807165440