Improving The Feature Representation of Speech Signals by Self-Organization
碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
1994
|
Online Access: | http://ndltd.ncl.edu.tw/handle/27455552395855493397 |
id |
ndltd-TW-082NTU00392055 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-082NTU003920552016-07-18T04:09:33Z http://ndltd.ncl.edu.tw/handle/27455552395855493397 Improving The Feature Representation of Speech Signals by Self-Organization 語音特徵的時序扭曲校正方法 Chien, Shuen-Der 簡順德 碩士 國立臺灣大學 資訊工程研究所 82 In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to represent the spectral features of spoken utterances. The prototype for each word is modeled by a one-dimensional self-organization feature map that consists of 100 equally spaced neurons(cells). With the topology map developed on the linear array of neurons, the precedence relations among the sequential spectral features are preserved. Hence, the mechanism of linear array of neurons is able to cope with the time alignment problem implicitly. Two perception energies E1 and E2 are experimentally designed for implementation of pattern matching. The first perception energy E1, which evaluates the similarity of distance between a prototype and a word utterance, is obtained from the accumulation of total excitations on the feature map during a word utterance. The other perception energy E2, which evaluates the similarity of timing between a prototype and a time-warped word utterance, is devised by properly fitting a precedence curve on the sequential excitation patterns of feature map among an utterance duration. Furthermore, two novel self- organizing algorithms, relaxation and topological adjustment of one-dimensional prototype on two-dimensional space, are proposed to improve the resolutions of E1 and E2. The relaxation process improves the resolution of E1 by more fine- tuned training. By properly selecting key neurons as a new prototype among a fine-tuned 2-D map, the computation load is not increased. Moreover, the topological adjustment process improves the resolution of E2 by narrowing the slope range of a sequential exciting curve. The concepts and methods presented in this paper are simulated on a personal computer with a modern DSP board and the result is quite satisfactory. Liou, Cheng-Yuan 劉長遠 1994 學位論文 ; thesis 59 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system
in which neural network is applied to the designs of specific
modules is described. Fifteen-five Melscale spectral
coefficients are used to represent the spectral features of
spoken utterances. The prototype for each word is modeled by a
one-dimensional self-organization feature map that consists of
100 equally spaced neurons(cells). With the topology map
developed on the linear array of neurons, the precedence
relations among the sequential spectral features are preserved.
Hence, the mechanism of linear array of neurons is able to cope
with the time alignment problem implicitly. Two perception
energies E1 and E2 are experimentally designed for
implementation of pattern matching. The first perception energy
E1, which evaluates the similarity of distance between a
prototype and a word utterance, is obtained from the
accumulation of total excitations on the feature map during a
word utterance. The other perception energy E2, which evaluates
the similarity of timing between a prototype and a time-warped
word utterance, is devised by properly fitting a precedence
curve on the sequential excitation patterns of feature map
among an utterance duration. Furthermore, two novel self-
organizing algorithms, relaxation and topological adjustment of
one-dimensional prototype on two-dimensional space, are
proposed to improve the resolutions of E1 and E2. The
relaxation process improves the resolution of E1 by more fine-
tuned training. By properly selecting key neurons as a new
prototype among a fine-tuned 2-D map, the computation load is
not increased. Moreover, the topological adjustment process
improves the resolution of E2 by narrowing the slope range of a
sequential exciting curve. The concepts and methods presented
in this paper are simulated on a personal computer with a
modern DSP board and the result is quite satisfactory.
|
author2 |
Liou, Cheng-Yuan |
author_facet |
Liou, Cheng-Yuan Chien, Shuen-Der 簡順德 |
author |
Chien, Shuen-Der 簡順德 |
spellingShingle |
Chien, Shuen-Der 簡順德 Improving The Feature Representation of Speech Signals by Self-Organization |
author_sort |
Chien, Shuen-Der |
title |
Improving The Feature Representation of Speech Signals by Self-Organization |
title_short |
Improving The Feature Representation of Speech Signals by Self-Organization |
title_full |
Improving The Feature Representation of Speech Signals by Self-Organization |
title_fullStr |
Improving The Feature Representation of Speech Signals by Self-Organization |
title_full_unstemmed |
Improving The Feature Representation of Speech Signals by Self-Organization |
title_sort |
improving the feature representation of speech signals by self-organization |
publishDate |
1994 |
url |
http://ndltd.ncl.edu.tw/handle/27455552395855493397 |
work_keys_str_mv |
AT chienshuender improvingthefeaturerepresentationofspeechsignalsbyselforganization AT jiǎnshùndé improvingthefeaturerepresentationofspeechsignalsbyselforganization AT chienshuender yǔyīntèzhēngdeshíxùniǔqūxiàozhèngfāngfǎ AT jiǎnshùndé yǔyīntèzhēngdeshíxùniǔqūxiàozhèngfāngfǎ |
_version_ |
1718352402059362304 |