Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification
碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the development of deep learning, the applications of artificial intelligence become more and more popular, and the performance of speech recognition also improve a lot. Wake-up word detection is also called keyword spotting, and it deals with the identificati...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/tu5xzc |
id |
ndltd-TW-107NCU05392168 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCU053921682019-10-22T05:28:16Z http://ndltd.ncl.edu.tw/handle/tu5xzc Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification 基於長短期記憶網路和連結時序分類的喚醒詞辨識 YU-SIN JHOU 周郁馨 碩士 國立中央大學 資訊工程學系 107 As the development of deep learning, the applications of artificial intelligence become more and more popular, and the performance of speech recognition also improve a lot. Wake-up word detection is also called keyword spotting, and it deals with the identification of keyword in audio signal. For now, Deep learning has better performance than traditional way such as hidden Markov model (HMM). To get a deep learning wake-up word model (for example, deep neural network, recurrent neural network), we have to used lots of specific word audio to train the model so that the model can learn the feature in wake-up word audio and predict if wake-up word is in the continuous audio signal. However, these keyword detection systems can only detect fixed keyword. If we want to change the keyword or add new keyword into system, we have to collect new keyword-specific data and re-train the model. In this thesis, we use long short-term memory network (LSTM) and connectionist temporal classifier (CTC) as keyword detection model. It is different from general keyword detection because this system uses LSTM to predict the posterior of phoneme and CTC to produce the possibility of the phoneme sequence. Due to predicting phoneme sequence, we can use non-keyword data as training data and let the model predict sequence more accurately. Besides, when changing the wake-up word, this system does not have to re-train. We just need to use some new wake-up word data to modify the system. 王家慶 2019 學位論文 ; thesis 32 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 資訊工程學系 === 107 === As the development of deep learning, the applications of artificial intelligence become more and more popular, and the performance of speech recognition also improve a lot. Wake-up word detection is also called keyword spotting, and it deals with the identification of keyword in audio signal. For now, Deep learning has better performance than traditional way such as hidden Markov model (HMM). To get a deep learning wake-up word model (for example, deep neural network, recurrent neural network), we have to used lots of specific word audio to train the model so that the model can learn the feature in wake-up word audio and predict if wake-up word is in the continuous audio signal. However, these keyword detection systems can only detect fixed keyword. If we want to change the keyword or add new keyword into system, we have to collect new keyword-specific data and re-train the model.
In this thesis, we use long short-term memory network (LSTM) and connectionist temporal classifier (CTC) as keyword detection model. It is different from general keyword detection because this system uses LSTM to predict the posterior of phoneme and CTC to produce the possibility of the phoneme sequence. Due to predicting phoneme sequence, we can use non-keyword data as training data and let the model predict sequence more accurately. Besides, when changing the wake-up word, this system does not have to re-train. We just need to use some new wake-up word data to modify the system.
|
author2 |
王家慶 |
author_facet |
王家慶 YU-SIN JHOU 周郁馨 |
author |
YU-SIN JHOU 周郁馨 |
spellingShingle |
YU-SIN JHOU 周郁馨 Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
author_sort |
YU-SIN JHOU |
title |
Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
title_short |
Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
title_full |
Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
title_fullStr |
Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
title_full_unstemmed |
Wake-up Word Detection Using Long Short Term Memory Network and Connectionist Temporal Classification |
title_sort |
wake-up word detection using long short term memory network and connectionist temporal classification |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/tu5xzc |
work_keys_str_mv |
AT yusinjhou wakeupworddetectionusinglongshorttermmemorynetworkandconnectionisttemporalclassification AT zhōuyùxīn wakeupworddetectionusinglongshorttermmemorynetworkandconnectionisttemporalclassification AT yusinjhou jīyúzhǎngduǎnqījìyìwǎnglùhéliánjiéshíxùfēnlèidehuànxǐngcíbiànshí AT zhōuyùxīn jīyúzhǎngduǎnqījìyìwǎnglùhéliánjiéshíxùfēnlèidehuànxǐngcíbiànshí |
_version_ |
1719274256812998656 |