Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain
碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === We build a two-pass Mandarin Automatic Speech Recognition (ASR) decoder on mobile device (PDA). The first-pass recognizing base syllable is implemented by discrete Hidden Markov Model (HMM) with time-synchronous, tree-lexicon Viterbi search. The second-pass dea...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/mwrs63 |
id |
ndltd-TW-097NSYS5392072 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NSYS53920722019-05-30T03:49:41Z http://ndltd.ncl.edu.tw/handle/mwrs63 Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain 基於旅遊對話運用嵌入式中文語音辨識系統之實作 Bo-han Chen 陳柏含 碩士 國立中山大學 資訊工程學系研究所 97 We build a two-pass Mandarin Automatic Speech Recognition (ASR) decoder on mobile device (PDA). The first-pass recognizing base syllable is implemented by discrete Hidden Markov Model (HMM) with time-synchronous, tree-lexicon Viterbi search. The second-pass dealing with language model, pronunciation lexicon and N-best syllable hypotheses from first-pass is implemented by Weighted Finite State Transducer (WFST). The best word sequence is obtained by shortest path algorithms over the composition result. This system limits the application in travel domain and it decouples the application of acoustic model and the application of language model into independent recognition passes. We report the real-time recognition performance performed on ASUS P565 with a 800MHz processor, 128MB RAM running Microsoft Window Mobile 6 operating system. The 26-hour TCC-300 speech data is used to train 151 acoustic model. The 3-minute speech data recorded by reading the travel-domain transcriptions is used as the testing set for evaluating the performances (syllable, character accuracies) and real-time factors on PC and on PDA. The trained bi-gram model with 3500-word from BTEC corpus is used in second-pass. In the first-pass, the best syllable accuracy is 38.8% given 30-best syllable hypotheses using continuous HMM and 26-dimension feature. Under the above syllable hypotheses and acoustic model, we obtain 27.6% character accuracy on PC after the second-pass. Chia-Ping Chen 陳嘉平 2009 學位論文 ; thesis 65 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === We build a two-pass Mandarin Automatic Speech Recognition (ASR) decoder on mobile device (PDA). The first-pass recognizing base syllable is implemented by discrete Hidden Markov Model (HMM) with time-synchronous, tree-lexicon Viterbi search. The second-pass dealing with language model, pronunciation lexicon and N-best syllable hypotheses from first-pass is implemented by Weighted Finite State Transducer (WFST). The best word sequence is obtained by shortest path algorithms over the composition result. This system limits the application in travel domain and it decouples the application of acoustic model and the application of language model into independent recognition passes. We report the real-time recognition performance performed on ASUS P565 with a 800MHz processor, 128MB RAM running Microsoft Window Mobile 6 operating system.
The 26-hour TCC-300 speech data is used to train 151 acoustic model. The 3-minute speech data recorded by reading the travel-domain transcriptions is used as the testing set for evaluating the performances (syllable, character accuracies) and real-time factors on PC and on PDA. The trained bi-gram model with 3500-word from BTEC corpus is used in second-pass.
In the first-pass, the best syllable accuracy is 38.8% given 30-best syllable hypotheses using continuous HMM and 26-dimension feature. Under the above syllable hypotheses and acoustic model, we obtain 27.6% character accuracy on PC after the second-pass.
|
author2 |
Chia-Ping Chen |
author_facet |
Chia-Ping Chen Bo-han Chen 陳柏含 |
author |
Bo-han Chen 陳柏含 |
spellingShingle |
Bo-han Chen 陳柏含 Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
author_sort |
Bo-han Chen |
title |
Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
title_short |
Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
title_full |
Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
title_fullStr |
Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
title_full_unstemmed |
Implementation of Embedded Mandarin SpeechRecognition System in Travel Domain |
title_sort |
implementation of embedded mandarin speechrecognition system in travel domain |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/mwrs63 |
work_keys_str_mv |
AT bohanchen implementationofembeddedmandarinspeechrecognitionsystemintraveldomain AT chénbǎihán implementationofembeddedmandarinspeechrecognitionsystemintraveldomain AT bohanchen jīyúlǚyóuduìhuàyùnyòngqiànrùshìzhōngwényǔyīnbiànshíxìtǒngzhīshízuò AT chénbǎihán jīyúlǚyóuduìhuàyùnyòngqiànrùshìzhōngwényǔyīnbiànshíxìtǒngzhīshízuò |
_version_ |
1719193467664465920 |