Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task
碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/2x5s95 |
id |
ndltd-TW-105NCU05392143 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCU053921432019-05-16T00:08:09Z http://ndltd.ncl.edu.tw/handle/2x5s95 Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task 快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務 Rezki Trianto 特利安 碩士 國立中央大學 資訊工程學系 105 Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate. Dr. Jia Ching Wang 王家慶 博士 2017 學位論文 ; thesis 75 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 資訊工程學系 === 105 === Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis.
In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate.
|
author2 |
Dr. Jia Ching Wang |
author_facet |
Dr. Jia Ching Wang Rezki Trianto 特利安 |
author |
Rezki Trianto 特利安 |
spellingShingle |
Rezki Trianto 特利安 Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
author_sort |
Rezki Trianto |
title |
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
title_short |
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
title_full |
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
title_fullStr |
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
title_full_unstemmed |
Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task |
title_sort |
fast-lstm acoustic model for distant speech recognition and wake-up-word task |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/2x5s95 |
work_keys_str_mv |
AT rezkitrianto fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask AT tèlìān fastlstmacousticmodelfordistantspeechrecognitionandwakeupwordtask AT rezkitrianto kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù AT tèlìān kuàisùzhǎngduǎnqījìyìshēngxuémóxíngyúyuǎnjùyǔyīnbiànshíjíhuànxǐngguānjiànzìrènwù |
_version_ |
1719160901208113152 |