Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

Sign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with diff...

Full description

Bibliographic Details
Main Authors: Yanqiu Liao, Pengwen Xiong, Weidong Min, Weiqiong Min, Jiahao Lu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8667292/
id doaj-fa345f1359b04c59bbf2db6c2b9e6245
record_format Article
spelling doaj-fa345f1359b04c59bbf2db6c2b9e62452021-04-05T16:59:44ZengIEEEIEEE Access2169-35362019-01-017380443805410.1109/ACCESS.2019.29047498667292Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual NetworksYanqiu Liao0Pengwen Xiong1Weidong Min2https://orcid.org/0000-0003-2526-2181Weiqiong Min3Jiahao Lu4School of Information Engineering, Nanchang University, Nanchang, ChinaSchool of Information Engineering, Nanchang University, Nanchang, ChinaSchool of Software, Nanchang University, Nanchang, ChinaSchool of Tourism, Jiangxi Science & Technology Normal University, Nanchang, ChinaSchool of Information Engineering, Nanchang University, Nanchang, ChinaSign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with difficulties of recognizing complex hand gestures, low recognition accuracy for most dynamic sign language recognition, and potential problems in larger video sequence data training. In order to solve these issues, this paper presents a multimodal dynamic sign language recognition method based on a deep 3-dimensional residual ConvNet and bi-directional LSTM networks, which is named as BLSTM-3D residual network (B3D ResNet). This method consists of three main parts. First, the hand object is localized in the video frames in order to reduce the time and space complexity of network calculation. Then, the B3D ResNet automatically extracts the spatiotemporal features from the video sequences and establishes an intermediate score corresponding to each action in the video sequence after feature analysis. Finally, by classifying the video sequences, the dynamic sign language is accurately identified. The experiment is conducted on test datasets, including DEVISIGN_D dataset and SLR_Dataset. The results show that the proposed method can obtain state-of-the-art recognition accuracy (89.8% on the DEVISIGN_D dataset and 86.9% on SLR_Dataset). In addition, the B3D ResNet can effectively recognize complex hand gestures through larger video sequence data, and obtain high recognition accuracy for 500 vocabularies from Chinese hand sign language.https://ieeexplore.ieee.org/document/8667292/Dynamic sign language recognitionbi-directional LSTMresidual ConvNetvideo sequence
collection DOAJ
language English
format Article
sources DOAJ
author Yanqiu Liao
Pengwen Xiong
Weidong Min
Weiqiong Min
Jiahao Lu
spellingShingle Yanqiu Liao
Pengwen Xiong
Weidong Min
Weiqiong Min
Jiahao Lu
Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
IEEE Access
Dynamic sign language recognition
bi-directional LSTM
residual ConvNet
video sequence
author_facet Yanqiu Liao
Pengwen Xiong
Weidong Min
Weiqiong Min
Jiahao Lu
author_sort Yanqiu Liao
title Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_short Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_full Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_fullStr Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_full_unstemmed Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
title_sort dynamic sign language recognition based on video sequence with blstm-3d residual networks
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Sign language recognition aims to recognize meaningful movements of hand gestures and is a significant solution in intelligent communication between the deaf community and hearing societies. However, until now, the current dynamic sign language recognition methods still have some drawbacks with difficulties of recognizing complex hand gestures, low recognition accuracy for most dynamic sign language recognition, and potential problems in larger video sequence data training. In order to solve these issues, this paper presents a multimodal dynamic sign language recognition method based on a deep 3-dimensional residual ConvNet and bi-directional LSTM networks, which is named as BLSTM-3D residual network (B3D ResNet). This method consists of three main parts. First, the hand object is localized in the video frames in order to reduce the time and space complexity of network calculation. Then, the B3D ResNet automatically extracts the spatiotemporal features from the video sequences and establishes an intermediate score corresponding to each action in the video sequence after feature analysis. Finally, by classifying the video sequences, the dynamic sign language is accurately identified. The experiment is conducted on test datasets, including DEVISIGN_D dataset and SLR_Dataset. The results show that the proposed method can obtain state-of-the-art recognition accuracy (89.8% on the DEVISIGN_D dataset and 86.9% on SLR_Dataset). In addition, the B3D ResNet can effectively recognize complex hand gestures through larger video sequence data, and obtain high recognition accuracy for 500 vocabularies from Chinese hand sign language.
topic Dynamic sign language recognition
bi-directional LSTM
residual ConvNet
video sequence
url https://ieeexplore.ieee.org/document/8667292/
work_keys_str_mv AT yanqiuliao dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
AT pengwenxiong dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
AT weidongmin dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
AT weiqiongmin dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
AT jiahaolu dynamicsignlanguagerecognitionbasedonvideosequencewithblstm3dresidualnetworks
_version_ 1721540544849510400