RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use...

Full description

Bibliographic Details
Main Authors:	Yun Liu, Ruidi Ma, Hui Li, Chuanxu Wang, Ye Tao
Format:	Article
Language:	English
Published:	Hindawi Limited 2021-01-01
Series:	Journal of Sensors
Online Access:	http://dx.doi.org/10.1155/2021/8864870

id	doaj-9acf775c62ee4ebd8bfff9b299ad6779
record_format	Article
spelling	doaj-9acf775c62ee4ebd8bfff9b299ad67792021-02-15T12:53:06ZengHindawi LimitedJournal of Sensors1687-725X1687-72682021-01-01202110.1155/2021/88648708864870RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNetYun Liu0Ruidi Ma1Hui Li2Chuanxu Wang3Ye Tao4College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaCollege of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, ChinaAction recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.http://dx.doi.org/10.1155/2021/8864870
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yun Liu Ruidi Ma Hui Li Chuanxu Wang Ye Tao
spellingShingle	Yun Liu Ruidi Ma Hui Li Chuanxu Wang Ye Tao RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet Journal of Sensors
author_facet	Yun Liu Ruidi Ma Hui Li Chuanxu Wang Ye Tao
author_sort	Yun Liu
title	RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_short	RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_full	RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_fullStr	RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_full_unstemmed	RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet
title_sort	rgb-d human action recognition of deep feature enhancement and fusion using two-stream convnet
publisher	Hindawi Limited
series	Journal of Sensors
issn	1687-725X 1687-7268
publishDate	2021-01-01
description	Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.
url	http://dx.doi.org/10.1155/2021/8864870
work_keys_str_mv	AT yunliu rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet AT ruidima rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet AT huili rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet AT chuanxuwang rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet AT yetao rgbdhumanactionrecognitionofdeepfeatureenhancementandfusionusingtwostreamconvnet
_version_	1714866651510341632

RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

Similar Items