RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use...

Full description

Bibliographic Details
Main Authors: Yun Liu, Ruidi Ma, Hui Li, Chuanxu Wang, Ye Tao
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Journal of Sensors
Online Access:http://dx.doi.org/10.1155/2021/8864870
id doaj-9acf775c62ee4ebd8bfff9b299ad6779
record_format Article
spelling doaj-9acf775c62ee4ebd8bfff9b299ad67792021-02-15T12:53:06ZengHindawi LimitedJournal of Sensors1687-725X1687-72682021-01-01202110.1155/2021/88648708864870RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNetYun Liu0Ruidi Ma1Hui Li2Chuanxu Wang3Ye Tao4College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaAction recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.http://dx.doi.org/10.1155/2021/8864870
collection DOAJ
language English
format Article
sources DOAJ
author Yun Liu
Ruidi Ma
Hui Li
Chuanxu Wang
Ye Tao
spellingShingle Yun Liu
Ruidi Ma
Hui Li
Chuanxu Wang
Ye Tao
RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
Journal of Sensors
author_facet Yun Liu
Ruidi Ma
Hui Li
Chuanxu Wang
Ye Tao
author_sort Yun Liu
title RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_short RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_full RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_fullStr RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_full_unstemmed RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_sort rgb-d human action recognition of deep feature enhancement and fusion using two-stream convnet
publisher Hindawi Limited
series Journal of Sensors
issn 1687-725X
1687-7268
publishDate 2021-01-01
description Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.
url http://dx.doi.org/10.1155/2021/8864870
work_keys_str_mv AT yunliu rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
AT ruidima rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
AT huili rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
AT chuanxuwang rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
AT yetao rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
_version_ 1714866651510341632