A Lip-Reading System with Deep Learning
碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a p...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/9jhh8b |
id |
ndltd-TW-106NDHU5392034 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NDHU53920342019-05-16T01:07:57Z http://ndltd.ncl.edu.tw/handle/9jhh8b A Lip-Reading System with Deep Learning 深度學習之唇語辨識系統 Tsang-Yu Cheng 鄭滄宇 碩士 國立東華大學 資訊工程學系 106 In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively. Cheng-Chin Chiang 江政欽 2018 學位論文 ; thesis 36 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立東華大學 === 資訊工程學系 === 106 === In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively.
|
author2 |
Cheng-Chin Chiang |
author_facet |
Cheng-Chin Chiang Tsang-Yu Cheng 鄭滄宇 |
author |
Tsang-Yu Cheng 鄭滄宇 |
spellingShingle |
Tsang-Yu Cheng 鄭滄宇 A Lip-Reading System with Deep Learning |
author_sort |
Tsang-Yu Cheng |
title |
A Lip-Reading System with Deep Learning |
title_short |
A Lip-Reading System with Deep Learning |
title_full |
A Lip-Reading System with Deep Learning |
title_fullStr |
A Lip-Reading System with Deep Learning |
title_full_unstemmed |
A Lip-Reading System with Deep Learning |
title_sort |
lip-reading system with deep learning |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/9jhh8b |
work_keys_str_mv |
AT tsangyucheng alipreadingsystemwithdeeplearning AT zhèngcāngyǔ alipreadingsystemwithdeeplearning AT tsangyucheng shēndùxuéxízhīchúnyǔbiànshíxìtǒng AT zhèngcāngyǔ shēndùxuéxízhīchúnyǔbiànshíxìtǒng AT tsangyucheng lipreadingsystemwithdeeplearning AT zhèngcāngyǔ lipreadingsystemwithdeeplearning |
_version_ |
1719173810678136832 |