On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces
碩士 === 國立虎尾科技大學 === 電機工程系碩士班 === 107 === Generally, matured speech recognition technology has been successfully accepted by the person with all normal face conditions. However, this voice recognition system designed on the market nowadays does not consider patients with facial nerve disorders (i.e....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/qheg4b |
id |
ndltd-TW-107NYPI0441022 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立虎尾科技大學 === 電機工程系碩士班 === 107 === Generally, matured speech recognition technology has been successfully accepted by the person with all normal face conditions. However, this voice recognition system designed on the market nowadays does not consider patients with facial nerve disorders (i.e. abnormal faces). Therefore, general speech recognition will be quite inconvenient for such weak group. This thesis develops a friendly voice recognition system that can be properly tailored to specific users with abnormal faces of facial nerve disorders.
The developed system in this thesis will simultaneously capture voice and facial features. Two differently recognition approaches, dynamic time warping (DTW) and convolutional neural network (CNN), are used for recognition determinations in the system. The designed system in this thesis will provide objective references or suggestive information for the abnormal face group and also the professional caregiver. This paper uses the above-mentioned method to construct a practical application system to the abnormal face group. The constructed system can be realized in two types of care applications, rehabilitation degree detection and single-word (syllable in Mandarin) recognition.
The DTW method in this thesis mainly uses the designed 20-dimentinoal face features, converting the original space coordinates of the face captured by the three-dimensional image sensor to the feature values. The designed 20-dimentinoal face features mainly contains two different types, which are ‘distance’ and ‘area’ classes. For each type of features, 10-dimensional parameters are designed, i.e. 10-dimensional distance and 10-dimensinal area feature parameters. DTW with the designed face features are applied to two works of degree detection of abnormal faces and word recognition. In this part of the study, the calculated distortion distance of template matching can be provided as an important parameter in the work of degree detection of abnormal faces.
The CNN deep learning method uses the layered VGG19 contained in the VGG architecture. The entire network architecture of VGG19 contains 19 layers including 16 convolutional layers with pooling calculations and 3 fully connected layers. In the CNN method, the video and voice data of the person with abnormal faces is captured first by the sensor, and the VGG19 model is then used to determine the abnormal degree of the face and the pronunciation result of the single-word made by the abnormal. Four different databases, face RGB images denoting the abnormal face, face RGB images denoting the made single-word, speech spectrm images denoting the abnormal face, and speech spectrm images denoting the made single-word, are designed for performance evaluations of the CNN model.
In the experiment, persons with the abnormal face are divided into two different parts, left-side abnormal faces and right-side abnormal faces. The deisgned system in this thesis is used for single-person care applications, and recognition performances of CNN denote the averaged recongiton result of 10 different users. In the DTW method with the designed 20-dimentinoal face features, recognition works of abnormal face degrees have accuracy of 84.49% (left-side abnormal faces) and 84.44% (right-side abnormal faces); recognition works of single-word pronunciation have accuracy of 83.38% (left-side abnormal faces) and 82.77% (right-side abnormal faces). On the other hand, in the CNN deep learning method, face RGB data outperforms speech spectrum data in both of two recognition works. recognition works of abnormal face degrees have accuracy of 99.14% (left-side abnormal faces) and 98.59% (right-side abnormal faces); recognition works of single-word pronunciation have accuracy of 95.25% (left-side abnormal faces) and 92.36% (right-side abnormal faces).
|
author2 |
DING, ING-JR |
author_facet |
DING, ING-JR YEN, CHENG-YU 顏承宇 |
author |
YEN, CHENG-YU 顏承宇 |
spellingShingle |
YEN, CHENG-YU 顏承宇 On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
author_sort |
YEN, CHENG-YU |
title |
On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
title_short |
On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
title_full |
On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
title_fullStr |
On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
title_full_unstemmed |
On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces |
title_sort |
on the use of video and speech sensing for designing an uttering-care system to the group of abnormal faces |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/qheg4b |
work_keys_str_mv |
AT yenchengyu ontheuseofvideoandspeechsensingfordesigninganutteringcaresystemtothegroupofabnormalfaces AT yánchéngyǔ ontheuseofvideoandspeechsensingfordesigninganutteringcaresystemtothegroupofabnormalfaces AT yenchengyu yùnyòngshìxùnjíyīnxùngǎncèzhīfēizhèngchángrényánmiànzúqúndefāyīnzhàohùxìtǒngshèjìyánjiū AT yánchéngyǔ yùnyòngshìxùnjíyīnxùngǎncèzhīfēizhèngchángrényánmiànzúqúndefāyīnzhàohùxìtǒngshèjìyánjiū |
_version_ |
1719262613847670784 |
spelling |
ndltd-TW-107NYPI04410222019-10-06T03:35:30Z http://ndltd.ncl.edu.tw/handle/qheg4b On the Use of Video and Speech Sensing for Designing an Uttering-Care System to the Group of Abnormal Faces 運用視訊及音訊感測之非正常人顏面族群的發音照護系統設計研究 YEN, CHENG-YU 顏承宇 碩士 國立虎尾科技大學 電機工程系碩士班 107 Generally, matured speech recognition technology has been successfully accepted by the person with all normal face conditions. However, this voice recognition system designed on the market nowadays does not consider patients with facial nerve disorders (i.e. abnormal faces). Therefore, general speech recognition will be quite inconvenient for such weak group. This thesis develops a friendly voice recognition system that can be properly tailored to specific users with abnormal faces of facial nerve disorders. The developed system in this thesis will simultaneously capture voice and facial features. Two differently recognition approaches, dynamic time warping (DTW) and convolutional neural network (CNN), are used for recognition determinations in the system. The designed system in this thesis will provide objective references or suggestive information for the abnormal face group and also the professional caregiver. This paper uses the above-mentioned method to construct a practical application system to the abnormal face group. The constructed system can be realized in two types of care applications, rehabilitation degree detection and single-word (syllable in Mandarin) recognition. The DTW method in this thesis mainly uses the designed 20-dimentinoal face features, converting the original space coordinates of the face captured by the three-dimensional image sensor to the feature values. The designed 20-dimentinoal face features mainly contains two different types, which are ‘distance’ and ‘area’ classes. For each type of features, 10-dimensional parameters are designed, i.e. 10-dimensional distance and 10-dimensinal area feature parameters. DTW with the designed face features are applied to two works of degree detection of abnormal faces and word recognition. In this part of the study, the calculated distortion distance of template matching can be provided as an important parameter in the work of degree detection of abnormal faces. The CNN deep learning method uses the layered VGG19 contained in the VGG architecture. The entire network architecture of VGG19 contains 19 layers including 16 convolutional layers with pooling calculations and 3 fully connected layers. In the CNN method, the video and voice data of the person with abnormal faces is captured first by the sensor, and the VGG19 model is then used to determine the abnormal degree of the face and the pronunciation result of the single-word made by the abnormal. Four different databases, face RGB images denoting the abnormal face, face RGB images denoting the made single-word, speech spectrm images denoting the abnormal face, and speech spectrm images denoting the made single-word, are designed for performance evaluations of the CNN model. In the experiment, persons with the abnormal face are divided into two different parts, left-side abnormal faces and right-side abnormal faces. The deisgned system in this thesis is used for single-person care applications, and recognition performances of CNN denote the averaged recongiton result of 10 different users. In the DTW method with the designed 20-dimentinoal face features, recognition works of abnormal face degrees have accuracy of 84.49% (left-side abnormal faces) and 84.44% (right-side abnormal faces); recognition works of single-word pronunciation have accuracy of 83.38% (left-side abnormal faces) and 82.77% (right-side abnormal faces). On the other hand, in the CNN deep learning method, face RGB data outperforms speech spectrum data in both of two recognition works. recognition works of abnormal face degrees have accuracy of 99.14% (left-side abnormal faces) and 98.59% (right-side abnormal faces); recognition works of single-word pronunciation have accuracy of 95.25% (left-side abnormal faces) and 92.36% (right-side abnormal faces). DING, ING-JR 丁英智 2019 學位論文 ; thesis 223 zh-TW |