Acoustic Model Training for Speech Recognition

碩士 === 國立臺北大學 === 通訊工程研究所 === 103 === Speech is the primary source of communication amongst human beings. This form of communication is the fundamental thread that underlies the progress of human evolution since time immemorial. Therefore, speech communication is the thread that is interwoven in the...

Full description

Bibliographic Details
Main Authors:	Nhlanhla Simanga Dlamini, 嵐奇
Other Authors:	CHEN-YU Chiang
Format:	Others
Language:	en_US
Published:	2015
Online Access:	http://ndltd.ncl.edu.tw/handle/39876544496883454574

id	ndltd-TW-103NTPU0650013
record_format	oai_dc
spelling	ndltd-TW-103NTPU06500132016-08-19T04:10:36Z http://ndltd.ncl.edu.tw/handle/39876544496883454574 Acoustic Model Training for Speech Recognition 語音辨識系統之聲學模型訓練研究 Nhlanhla Simanga Dlamini 嵐奇碩士國立臺北大學通訊工程研究所 103 Speech is the primary source of communication amongst human beings. This form of communication is the fundamental thread that underlies the progress of human evolution since time immemorial. Therefore, speech communication is the thread that is interwoven in the fabric of every human culture. A compelling reason to study and work with speech is that it is indubitably the most common form of communication within the human community, rendering speech communication ubiquitous and pervasive. This thesis seeks to investigate and develop the acoustic model training for a speech recognition system. This is the first step in building a speech recognition system. Underlying such training is a speech recognition engine whose applications are multifold. Applications of a speech recognition system are, inter alia; 1. Small vocabulary keyword recognition over dial-up telephone lines. 2. Medium size vocabulary voice interactive command and control systems, e.g. IVR in telecommunications and voice activated systems in automobiles, banks and disabled community. 3. Limited domain speech translation. Training refers to the process of parameter estimation; that seeks to maximize the likelihood of the observation given a string of words. First and foremost, a hierarchical approach is adopted wherein different sources of information are represented. This is motivated by the fact that speech recognition depends on vocabulary, language model and HMM models. In this thesis, speech is modeled by HMM states, pronunciation dictionary is the TIMIT dictionary and the training corpus is TIMIT training data. In this thesis, a network representing these three sources of information will be built by using FST technology. The motivation for this study is dual fold. It is motivated by the academic requirements as well as the professional need to fulfill a long standing desire of doing something great. From the academic perspective, algorithms, methods and techniques (mathematical and computational) are sought to conceptualize the training as a first step in building a speech recognition system. The success of this academic endeavor will culminate to the application of the similar ideas in industry. As I am devoted to this study, professionally, as I am a telecommunication engineer, I can think of immediate applications of this research in industry. It turns out that speech communication can be extended to human-machine communication. To make a machine know and respond to speech is a task of pattern recognition, which this thesis endeavors to investigate and study. Success to this extension implies an increasing domain of speech communication and that success is attributable to the wider community of speech researchers and professionals. As a result of the increasing domain of speech communication, businesses will also wish to embrace the concept of human-machine speech communication. One of the driving forces to embrace such a concept is the economic and social conveniences that come with this form of communication. By embracing such a technology, benefits are huge. Interacting with a machine will be effortless, giving rise to the number of people making use of the technology, e.g. the disabled and elderly community. CHEN-YU Chiang 江振宇 2015 學位論文 ; thesis 79 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺北大學 === 通訊工程研究所 === 103 === Speech is the primary source of communication amongst human beings. This form of communication is the fundamental thread that underlies the progress of human evolution since time immemorial. Therefore, speech communication is the thread that is interwoven in the fabric of every human culture. A compelling reason to study and work with speech is that it is indubitably the most common form of communication within the human community, rendering speech communication ubiquitous and pervasive. This thesis seeks to investigate and develop the acoustic model training for a speech recognition system. This is the first step in building a speech recognition system. Underlying such training is a speech recognition engine whose applications are multifold. Applications of a speech recognition system are, inter alia; 1. Small vocabulary keyword recognition over dial-up telephone lines. 2. Medium size vocabulary voice interactive command and control systems, e.g. IVR in telecommunications and voice activated systems in automobiles, banks and disabled community. 3. Limited domain speech translation. Training refers to the process of parameter estimation; that seeks to maximize the likelihood of the observation given a string of words. First and foremost, a hierarchical approach is adopted wherein different sources of information are represented. This is motivated by the fact that speech recognition depends on vocabulary, language model and HMM models. In this thesis, speech is modeled by HMM states, pronunciation dictionary is the TIMIT dictionary and the training corpus is TIMIT training data. In this thesis, a network representing these three sources of information will be built by using FST technology. The motivation for this study is dual fold. It is motivated by the academic requirements as well as the professional need to fulfill a long standing desire of doing something great. From the academic perspective, algorithms, methods and techniques (mathematical and computational) are sought to conceptualize the training as a first step in building a speech recognition system. The success of this academic endeavor will culminate to the application of the similar ideas in industry. As I am devoted to this study, professionally, as I am a telecommunication engineer, I can think of immediate applications of this research in industry. It turns out that speech communication can be extended to human-machine communication. To make a machine know and respond to speech is a task of pattern recognition, which this thesis endeavors to investigate and study. Success to this extension implies an increasing domain of speech communication and that success is attributable to the wider community of speech researchers and professionals. As a result of the increasing domain of speech communication, businesses will also wish to embrace the concept of human-machine speech communication. One of the driving forces to embrace such a concept is the economic and social conveniences that come with this form of communication. By embracing such a technology, benefits are huge. Interacting with a machine will be effortless, giving rise to the number of people making use of the technology, e.g. the disabled and elderly community.
author2	CHEN-YU Chiang
author_facet	CHEN-YU Chiang Nhlanhla Simanga Dlamini 嵐奇
author	Nhlanhla Simanga Dlamini 嵐奇
spellingShingle	Nhlanhla Simanga Dlamini 嵐奇 Acoustic Model Training for Speech Recognition
author_sort	Nhlanhla Simanga Dlamini
title	Acoustic Model Training for Speech Recognition
title_short	Acoustic Model Training for Speech Recognition
title_full	Acoustic Model Training for Speech Recognition
title_fullStr	Acoustic Model Training for Speech Recognition
title_full_unstemmed	Acoustic Model Training for Speech Recognition
title_sort	acoustic model training for speech recognition
publishDate	2015
url	http://ndltd.ncl.edu.tw/handle/39876544496883454574
work_keys_str_mv	AT nhlanhlasimangadlamini acousticmodeltrainingforspeechrecognition AT lánqí acousticmodeltrainingforspeechrecognition AT nhlanhlasimangadlamini yǔyīnbiànshíxìtǒngzhīshēngxuémóxíngxùnliànyánjiū AT lánqí yǔyīnbiànshíxìtǒngzhīshēngxuémóxíngxùnliànyánjiū
_version_	1718378670429569024

Acoustic Model Training for Speech Recognition

Similar Items