Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task

碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems....

Full description

Bibliographic Details
Main Authors:	Rezki Trianto, 特利安
Other Authors:	Dr. Jia Ching Wang
Format:	Others
Language:	en_US
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/2x5s95

id	ndltd-TW-105NCU05392143
record_format	oai_dc
spelling	ndltd-TW-105NCU053921432019-05-16T00:08:09Z http://ndltd.ncl.edu.tw/handle/2x5s95 Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task 快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務 Rezki Trianto 特利安碩士國立中央大學資訊工程學系 105 Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate. Dr. Jia Ching Wang 王家慶博士 2017 學位論文 ; thesis 75 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate.
author2	Dr. Jia Ching Wang
author_facet	Dr. Jia Ching Wang Rezki Trianto 特利安
author	Rezki Trianto 特利安
spellingShingle	Rezki Trianto 特利安 Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
author_sort	Rezki Trianto
title	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_short	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_full	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_fullStr	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_full_unstemmed	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_sort	fast-lstm acoustic model for distant speech recognition and wake-up-word task
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/2x5s95
work_keys_str_mv	AT rezkitrianto fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask AT tèlìān fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask AT rezkitrianto kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù AT tèlìān kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù
_version_	1719160901208113152

Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task

Similar Items