Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === We build a two-pass Mandarin Automatic Speech Recognition (ASR) decoder on mobile device (PDA). The first-pass recognizing base syllable is implemented by discrete Hidden Markov Model (HMM) with time-synchronous, tree-lexicon Viterbi search. The second-pass dealing with language model, pronunciation lexicon and N-best syllable hypotheses from first-pass is implemented by Weighted Finite State Transducer (WFST). The best word sequence is obtained by shortest path algorithms over the composition result. This system limits the application in travel domain and it decouples the application of acoustic model and the application of language model into independent recognition passes. We report the real-time recognition performance performed on ASUS P565 with a 800MHz processor, 128MB RAM running Microsoft Window Mobile 6 operating system.
The 26-hour TCC-300 speech data is used to train 151 acoustic model. The 3-minute speech data recorded by reading the travel-domain transcriptions is used as the testing set for evaluating the performances (syllable, character accuracies) and real-time factors on PC and on PDA. The trained bi-gram model with 3500-word from BTEC corpus is used in second-pass.
In the first-pass, the best syllable accuracy is 38.8% given 30-best syllable hypotheses using continuous HMM and 26-dimension feature. Under the above syllable hypotheses and acoustic model, we obtain 27.6% character accuracy on PC after the second-pass.
|