A Lip-Reading System with Deep Learning

碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a p...

Full description

Bibliographic Details
Main Authors:	Tsang-Yu Cheng, 鄭滄宇
Other Authors:	Cheng-Chin Chiang
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/9jhh8b

id	ndltd-TW-106NDHU5392034
record_format	oai_dc
spelling	ndltd-TW-106NDHU53920342019-05-16T01:07:57Z http://ndltd.ncl.edu.tw/handle/9jhh8b A Lip-Reading System with Deep Learning 深度學習之唇語辨識系統 Tsang-Yu Cheng 鄭滄宇碩士國立東華大學資訊工程學系 106 In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively. Cheng-Chin Chiang 江政欽 2018 學位論文 ; thesis 36 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively.
author2	Cheng-Chin Chiang
author_facet	Cheng-Chin Chiang Tsang-Yu Cheng 鄭滄宇
author	Tsang-Yu Cheng 鄭滄宇
spellingShingle	Tsang-Yu Cheng 鄭滄宇 A Lip-Reading System with Deep Learning
author_sort	Tsang-Yu Cheng
title	A Lip-Reading System with Deep Learning
title_short	A Lip-Reading System with Deep Learning
title_full	A Lip-Reading System with Deep Learning
title_fullStr	A Lip-Reading System with Deep Learning
title_full_unstemmed	A Lip-Reading System with Deep Learning
title_sort	lip-reading system with deep learning
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/9jhh8b
work_keys_str_mv	AT tsangyucheng alipreadingsystemwithdeeplearning AT zhèngcāngyǔ alipreadingsystemwithdeeplearning AT tsangyucheng shēndùxuéxízhīchúnyǔbiànshíxìtǒng AT zhèngcāngyǔ shēndùxuéxízhīchúnyǔbiànshíxìtǒng AT tsangyucheng lipreadingsystemwithdeeplearning AT zhèngcāngyǔ lipreadingsystemwithdeeplearning
_version_	1719173810678136832

A Lip-Reading System with Deep Learning

Similar Items