A Lip-Reading System with Deep Learning

碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a p...

Full description

Bibliographic Details
Main Authors: Tsang-Yu Cheng, 鄭滄宇
Other Authors: Cheng-Chin Chiang
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/9jhh8b
id ndltd-TW-106NDHU5392034
record_format oai_dc
spelling ndltd-TW-106NDHU53920342019-05-16T01:07:57Z http://ndltd.ncl.edu.tw/handle/9jhh8b A Lip-Reading System with Deep Learning 深度學習之唇語辨識系統 Tsang-Yu Cheng 鄭滄宇 碩士 國立東華大學 資訊工程學系 106 In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively. Cheng-Chin Chiang 江政欽 2018 學位論文 ; thesis 36 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively.
author2 Cheng-Chin Chiang
author_facet Cheng-Chin Chiang
Tsang-Yu Cheng
鄭滄宇
author Tsang-Yu Cheng
鄭滄宇
spellingShingle Tsang-Yu Cheng
鄭滄宇
A Lip-Reading System with Deep Learning
author_sort Tsang-Yu Cheng
title A Lip-Reading System with Deep Learning
title_short A Lip-Reading System with Deep Learning
title_full A Lip-Reading System with Deep Learning
title_fullStr A Lip-Reading System with Deep Learning
title_full_unstemmed A Lip-Reading System with Deep Learning
title_sort lip-reading system with deep learning
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/9jhh8b
work_keys_str_mv AT tsangyucheng alipreadingsystemwithdeeplearning
AT zhèngcāngyǔ alipreadingsystemwithdeeplearning
AT tsangyucheng shēndùxuéxízhīchúnyǔbiànshíxìtǒng
AT zhèngcāngyǔ shēndùxuéxízhīchúnyǔbiànshíxìtǒng
AT tsangyucheng lipreadingsystemwithdeeplearning
AT zhèngcāngyǔ lipreadingsystemwithdeeplearning
_version_ 1719173810678136832