Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, m...

Full description

Bibliographic Details
Main Authors:	Xuejie Zhang, Yan Xu, Andrew K. Abel, Leslie S. Smith, Roger Watt, Amir Hussain, Chengxiang Gao
Format:	Article
Language:	English
Published:	MDPI AG 2020-12-01
Series:	Entropy
Subjects:	speech recognition image processing gabor features lip reading explainable
Online Access:	https://www.mdpi.com/1099-4300/22/12/1367

id	doaj-fd5edd81e58d488ba8b2089e788dae83
record_format	Article
spelling	doaj-fd5edd81e58d488ba8b2089e788dae832020-12-04T00:06:22ZengMDPI AGEntropy1099-43002020-12-01221367136710.3390/e22121367Visual Speech Recognition with Lightweight Psychologically Motivated Gabor FeaturesXuejie Zhang0Yan Xu1Andrew K. Abel2Leslie S. Smith3Roger Watt4Amir Hussain5Chengxiang Gao6Department of Computer Science and Software Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, ChinaDepartment of Computer Science and Software Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, ChinaDepartment of Computer Science and Software Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, ChinaFaculty of Natural Sciences, University of Stirling, Stirling FK9 4AL, UKFaculty of Natural Sciences, University of Stirling, Stirling FK9 4AL, UKSchool of Computing, Edinburgh Napier University, Edinburgh EH11 4DY, UKDepartment of Computer Science and Software Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, ChinaExtraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature.https://www.mdpi.com/1099-4300/22/12/1367speech recognitionimage processinggabor featureslip readingexplainable
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xuejie Zhang Yan Xu Andrew K. Abel Leslie S. Smith Roger Watt Amir Hussain Chengxiang Gao
spellingShingle	Xuejie Zhang Yan Xu Andrew K. Abel Leslie S. Smith Roger Watt Amir Hussain Chengxiang Gao Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features Entropy speech recognition image processing gabor features lip reading explainable
author_facet	Xuejie Zhang Yan Xu Andrew K. Abel Leslie S. Smith Roger Watt Amir Hussain Chengxiang Gao
author_sort	Xuejie Zhang
title	Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_short	Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_full	Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_fullStr	Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_full_unstemmed	Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_sort	visual speech recognition with lightweight psychologically motivated gabor features
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2020-12-01
description	Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature.
topic	speech recognition image processing gabor features lip reading explainable
url	https://www.mdpi.com/1099-4300/22/12/1367
work_keys_str_mv	AT xuejiezhang visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT yanxu visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT andrewkabel visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT lesliessmith visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT rogerwatt visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT amirhussain visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures AT chengxianggao visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
_version_	1724400942428717056

Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Similar Items