Summary: | 碩士 === 國立交通大學 === 電信工程研究所 === 84 === The design and implementation of an RNN-based speech recognizer
forlarge-vocabulary isolated Mandarin word on a Pentium PC wih
SoundBlaster add-on card and Windows95 environment is present in
this thesis. It can be functionally divided into two parts: pre-
processingand word recognition. In pre-processing, a small RNN
is first used todiscriminate input speech from the background
silence. Driven by theoutput of th RNN, a finite state machine
is then used to determine all word boundaries. State-dependent
constraints are then added to eliminate some computations of
feature extraction. This can relievethe load of CPU. After
entering the state of the end of utternce, word recognition
using an RNN base-syllable recognizer and an RNN tone recognizer
is then performed to determine the best N candidatesof word. It
is noted that the pre-processing is run in real-time.So, an
average waiting time of 0.876 seconds for word recognitioncan be
achieved. Recognition rates of 20.6%, 51.6%, 88.4%, 95.2%
and100% are obtained for one- to five-syllabic words,
respectively.
|