Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task

碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems....

Full description

Bibliographic Details
Main Authors: Rezki Trianto, 特利安
Other Authors: Dr. Jia Ching Wang
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/2x5s95
id ndltd-TW-105NCU05392143
record_format oai_dc
spelling ndltd-TW-105NCU053921432019-05-16T00:08:09Z http://ndltd.ncl.edu.tw/handle/2x5s95 Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task 快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務 Rezki Trianto 特利安 碩士 國立中央大學 資訊工程學系 105 Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate. Dr. Jia Ching Wang 王家慶 博士 2017 學位論文 ; thesis 75 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate.
author2 Dr. Jia Ching Wang
author_facet Dr. Jia Ching Wang
Rezki Trianto
特利安
author Rezki Trianto
特利安
spellingShingle Rezki Trianto
特利安
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
author_sort Rezki Trianto
title Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_short Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_full Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_fullStr Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_full_unstemmed Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
title_sort fast-lstm acoustic model for distant speech recognition and wake-up-word task
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/2x5s95
work_keys_str_mv AT rezkitrianto fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask
AT tèlìān fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask
AT rezkitrianto kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù
AT tèlìān kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù
_version_ 1719160901208113152