Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

Sign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with diff...

Full description

Bibliographic Details
Main Authors:	Yanqiu Liao, Pengwen Xiong, Weidong Min, Weiqiong Min, Jiahao Lu
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Dynamic sign language recognition bi-directional LSTM residual ConvNet video sequence
Online Access:	https://ieeexplore.ieee.org/document/8667292/

id	doaj-fa345f1359b04c59bbf2db6c2b9e6245
record_format	Article
spelling	doaj-fa345f1359b04c59bbf2db6c2b9e62452021-04-05T16:59:44ZengIEEEIEEE Access2169-35362019-01-017380443805410.1109/ACCESS.2019.29047498667292Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual NetworksYanqiu Liao0Pengwen Xiong1Weidong Min2https://orcid.org/0000-0003-2526-2181Weiqiong Min3Jiahao Lu4School of Information Engineering, Nanchang University, Nanchang, ChinaSchool of Information Engineering, Nanchang University, Nanchang, ChinaSchool of Software, Nanchang University, Nanchang, ChinaSchool of Tourism, Jiangxi Science & Technology Normal University, Nanchang, ChinaSchool of Information Engineering, Nanchang University, Nanchang, ChinaSign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with difficulties of recognizing complex hand gestures, low recognition accuracy for most dynamic sign language recognition, and potential problems in larger video sequence data training. In order to solve these issues, this paper presents a multimodal dynamic sign language recognition method based on a deep 3-dimensional residual ConvNet and bi-directional LSTM networks, which is named as BLSTM-3D residual network (B3D ResNet). This method consists of three main parts. First, the hand object is localized in the video frames in order to reduce the time and space complexity of network calculation. Then, the B3D ResNet automatically extracts the spatiotemporal features from the video sequences and establishes an intermediate score corresponding to each action in the video sequence after feature analysis. Finally, by classifying the video sequences, the dynamic sign language is accurately identified. The experiment is conducted on test datasets, including DEVISIGN_D dataset and SLR_Dataset. The results show that the proposed method can obtain state-of-the-art recognition accuracy (89.8% on the DEVISIGN_D dataset and 86.9% on SLR_Dataset). In addition, the B3D ResNet can effectively recognize complex hand gestures through larger video sequence data, and obtain high recognition accuracy for 500 vocabularies from Chinese hand sign language.https://ieeexplore.ieee.org/document/8667292/Dynamic sign language recognitionbi-directional LSTMresidual ConvNetvideo sequence
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yanqiu Liao Pengwen Xiong Weidong Min Weiqiong Min Jiahao Lu
spellingShingle	Yanqiu Liao Pengwen Xiong Weidong Min Weiqiong Min Jiahao Lu Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks IEEE Access Dynamic sign language recognition bi-directional LSTM residual ConvNet video sequence
author_facet	Yanqiu Liao Pengwen Xiong Weidong Min Weiqiong Min Jiahao Lu
author_sort	Yanqiu Liao
title	Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_short	Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_full	Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_fullStr	Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_full_unstemmed	Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_sort	dynamic sign language recognition based on video sequence with blstm-3d residual networks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Sign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with difficulties of recognizing complex hand gestures, low recognition accuracy for most dynamic sign language recognition, and potential problems in larger video sequence data training. In order to solve these issues, this paper presents a multimodal dynamic sign language recognition method based on a deep 3-dimensional residual ConvNet and bi-directional LSTM networks, which is named as BLSTM-3D residual network (B3D ResNet). This method consists of three main parts. First, the hand object is localized in the video frames in order to reduce the time and space complexity of network calculation. Then, the B3D ResNet automatically extracts the spatiotemporal features from the video sequences and establishes an intermediate score corresponding to each action in the video sequence after feature analysis. Finally, by classifying the video sequences, the dynamic sign language is accurately identified. The experiment is conducted on test datasets, including DEVISIGN_D dataset and SLR_Dataset. The results show that the proposed method can obtain state-of-the-art recognition accuracy (89.8% on the DEVISIGN_D dataset and 86.9% on SLR_Dataset). In addition, the B3D ResNet can effectively recognize complex hand gestures through larger video sequence data, and obtain high recognition accuracy for 500 vocabularies from Chinese hand sign language.
topic	Dynamic sign language recognition bi-directional LSTM residual ConvNet video sequence
url	https://ieeexplore.ieee.org/document/8667292/
work_keys_str_mv	AT yanqiuliao dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks AT pengwenxiong dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks AT weidongmin dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks AT weiqiongmin dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks AT jiahaolu dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
_version_	1721540544849510400

Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

Similar Items