Summary: | 碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system
in which neural network is applied to the designs of specific
modules is described. Fifteen-five Melscale spectral
coefficients are used to represent the spectral features of
spoken utterances. The prototype for each word is modeled by a
one-dimensional self-organization feature map that consists of
100 equally spaced neurons(cells). With the topology map
developed on the linear array of neurons, the precedence
relations among the sequential spectral features are preserved.
Hence, the mechanism of linear array of neurons is able to cope
with the time alignment problem implicitly. Two perception
energies E1 and E2 are experimentally designed for
implementation of pattern matching. The first perception energy
E1, which evaluates the similarity of distance between a
prototype and a word utterance, is obtained from the
accumulation of total excitations on the feature map during a
word utterance. The other perception energy E2, which evaluates
the similarity of timing between a prototype and a time-warped
word utterance, is devised by properly fitting a precedence
curve on the sequential excitation patterns of feature map
among an utterance duration. Furthermore, two novel self-
organizing algorithms, relaxation and topological adjustment of
one-dimensional prototype on two-dimensional space, are
proposed to improve the resolutions of E1 and E2. The
relaxation process improves the resolution of E1 by more fine-
tuned training. By properly selecting key neurons as a new
prototype among a fine-tuned 2-D map, the computation load is
not increased. Moreover, the topological adjustment process
improves the resolution of E2 by narrowing the slope range of a
sequential exciting curve. The concepts and methods presented
in this paper are simulated on a personal computer with a
modern DSP board and the result is quite satisfactory.
|