Improving The Feature Representation of Speech Signals by Self-Organization

碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to...

Full description

Bibliographic Details
Main Authors: Chien, Shuen-Der, 簡順德
Other Authors: Liou, Cheng-Yuan
Format: Others
Language:zh-TW
Published: 1994
Online Access:http://ndltd.ncl.edu.tw/handle/27455552395855493397
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to represent the spectral features of spoken utterances. The prototype for each word is modeled by a one-dimensional self-organization feature map that consists of 100 equally spaced neurons(cells). With the topology map developed on the linear array of neurons, the precedence relations among the sequential spectral features are preserved. Hence, the mechanism of linear array of neurons is able to cope with the time alignment problem implicitly. Two perception energies E1 and E2 are experimentally designed for implementation of pattern matching. The first perception energy E1, which evaluates the similarity of distance between a prototype and a word utterance, is obtained from the accumulation of total excitations on the feature map during a word utterance. The other perception energy E2, which evaluates the similarity of timing between a prototype and a time-warped word utterance, is devised by properly fitting a precedence curve on the sequential excitation patterns of feature map among an utterance duration. Furthermore, two novel self- organizing algorithms, relaxation and topological adjustment of one-dimensional prototype on two-dimensional space, are proposed to improve the resolutions of E1 and E2. The relaxation process improves the resolution of E1 by more fine- tuned training. By properly selecting key neurons as a new prototype among a fine-tuned 2-D map, the computation load is not increased. Moreover, the topological adjustment process improves the resolution of E2 by narrowing the slope range of a sequential exciting curve. The concepts and methods presented in this paper are simulated on a personal computer with a modern DSP board and the result is quite satisfactory.