An application of Speech Recognition use Markov Hidden Model on Controlling TV System

碩士 === 國立臺灣大學 === 工程科學及海洋工程學研究所 === 100 === Regardless of past, present or future, speech recognition system always plays an important role in scientific technology. It is very important to couple fast development of the electronic equipment and signal processing technology with the cloud concepts....

Full description

Bibliographic Details
Main Authors: Chun-Che Yang, 楊俊哲
Other Authors: 陳國在
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/50870746484579782249
Description
Summary:碩士 === 國立臺灣大學 === 工程科學及海洋工程學研究所 === 100 === Regardless of past, present or future, speech recognition system always plays an important role in scientific technology. It is very important to couple fast development of the electronic equipment and signal processing technology with the cloud concepts. Accordingly, the speech recognition is much useful than before. As regards to speech control, it means that machines can understand human’s languages. In this study, the speech recognition system is based on HMM (Hidden Markov Models) algorithm. Therefore, it is of course that the basic signal processing and MFCC (Mel-scale Frequency Cepstral Coefficient) are used by the framework of development speech recognition. Moreover, some speech models are used by Viterbi algorithm to identify the human speeches. Specially, the speech signal involved must be processed to be more easily to use, and MFCC is used to extract the speech features. In order to achieve good features, the quality of the speech recording is the key point for recognition. This study is to describe an embedded system via speech control. In which, the Chinese speech recognition based on HMM is used in control systems and further the TV is controlled in a specific voice commands experiment. In the above experiment, it is divided into three architectures. Among them, the recognized speech is compared with the original one in the first structure, and consequently the recognition rate by 97.4% is obtained. In the second structure, non-learning speech is recognized to compare with the speech data base, in which the recognition rate by 92.4% is obtained. As regards to the third framework, specifically non-experimental speech is identified and controlled by importing the decision tree involved, so as to investigate whether the behavior to be required by the speech is achieved or not, in which the control accuracy rate by 96% is obtained. The experimental results show that an appropriate selection on status-state numbers can get high recognition rate. On the contrary, too much or too less status- states will lead the lower recognition rate. Too many Gaussian mixtures cause complicated computation and consequently to drive the recognition rate no longer to increase. Finally in the study, it is to explore how to integrate speech recognition technology built and its related hardware control technology that is applied in the TV control.