Improving The Feature Representation of Speech Signals by Self-Organization

碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to...

Full description

Bibliographic Details
Main Authors: Chien, Shuen-Der, 簡順德
Other Authors: Liou, Cheng-Yuan
Format: Others
Language:zh-TW
Published: 1994
Online Access:http://ndltd.ncl.edu.tw/handle/27455552395855493397
id ndltd-TW-082NTU00392055
record_format oai_dc
spelling ndltd-TW-082NTU003920552016-07-18T04:09:33Z http://ndltd.ncl.edu.tw/handle/27455552395855493397 Improving The Feature Representation of Speech Signals by Self-Organization 語音特徵的時序扭曲校正方法 Chien, Shuen-Der 簡順德 碩士 國立臺灣大學 資訊工程研究所 82 In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to represent the spectral features of spoken utterances. The prototype for each word is modeled by a one-dimensional self-organization feature map that consists of 100 equally spaced neurons(cells). With the topology map developed on the linear array of neurons, the precedence relations among the sequential spectral features are preserved. Hence, the mechanism of linear array of neurons is able to cope with the time alignment problem implicitly. Two perception energies E1 and E2 are experimentally designed for implementation of pattern matching. The first perception energy E1, which evaluates the similarity of distance between a prototype and a word utterance, is obtained from the accumulation of total excitations on the feature map during a word utterance. The other perception energy E2, which evaluates the similarity of timing between a prototype and a time-warped word utterance, is devised by properly fitting a precedence curve on the sequential excitation patterns of feature map among an utterance duration. Furthermore, two novel self- organizing algorithms, relaxation and topological adjustment of one-dimensional prototype on two-dimensional space, are proposed to improve the resolutions of E1 and E2. The relaxation process improves the resolution of E1 by more fine- tuned training. By properly selecting key neurons as a new prototype among a fine-tuned 2-D map, the computation load is not increased. Moreover, the topological adjustment process improves the resolution of E2 by narrowing the slope range of a sequential exciting curve. The concepts and methods presented in this paper are simulated on a personal computer with a modern DSP board and the result is quite satisfactory. Liou, Cheng-Yuan 劉長遠 1994 學位論文 ; thesis 59 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === In this paper, a continuous Mandarin speech recognition system in which neural network is applied to the designs of specific modules is described. Fifteen-five Melscale spectral coefficients are used to represent the spectral features of spoken utterances. The prototype for each word is modeled by a one-dimensional self-organization feature map that consists of 100 equally spaced neurons(cells). With the topology map developed on the linear array of neurons, the precedence relations among the sequential spectral features are preserved. Hence, the mechanism of linear array of neurons is able to cope with the time alignment problem implicitly. Two perception energies E1 and E2 are experimentally designed for implementation of pattern matching. The first perception energy E1, which evaluates the similarity of distance between a prototype and a word utterance, is obtained from the accumulation of total excitations on the feature map during a word utterance. The other perception energy E2, which evaluates the similarity of timing between a prototype and a time-warped word utterance, is devised by properly fitting a precedence curve on the sequential excitation patterns of feature map among an utterance duration. Furthermore, two novel self- organizing algorithms, relaxation and topological adjustment of one-dimensional prototype on two-dimensional space, are proposed to improve the resolutions of E1 and E2. The relaxation process improves the resolution of E1 by more fine- tuned training. By properly selecting key neurons as a new prototype among a fine-tuned 2-D map, the computation load is not increased. Moreover, the topological adjustment process improves the resolution of E2 by narrowing the slope range of a sequential exciting curve. The concepts and methods presented in this paper are simulated on a personal computer with a modern DSP board and the result is quite satisfactory.
author2 Liou, Cheng-Yuan
author_facet Liou, Cheng-Yuan
Chien, Shuen-Der
簡順德
author Chien, Shuen-Der
簡順德
spellingShingle Chien, Shuen-Der
簡順德
Improving The Feature Representation of Speech Signals by Self-Organization
author_sort Chien, Shuen-Der
title Improving The Feature Representation of Speech Signals by Self-Organization
title_short Improving The Feature Representation of Speech Signals by Self-Organization
title_full Improving The Feature Representation of Speech Signals by Self-Organization
title_fullStr Improving The Feature Representation of Speech Signals by Self-Organization
title_full_unstemmed Improving The Feature Representation of Speech Signals by Self-Organization
title_sort improving the feature representation of speech signals by self-organization
publishDate 1994
url http://ndltd.ncl.edu.tw/handle/27455552395855493397
work_keys_str_mv AT chienshuender improvingthefeaturerepresentationofspeechsignalsbyselforganization
AT jiǎnshùndé improvingthefeaturerepresentationofspeechsignalsbyselforganization
AT chienshuender yǔyīntèzhēngdeshíxùniǔqūxiàozhèngfāngfǎ
AT jiǎnshùndé yǔyīntèzhēngdeshíxùniǔqūxiàozhèngfāngfǎ
_version_ 1718352402059362304