Automatic Identification and Indexing of Chinese Multilingual Spoken Messages

博士 === 國立交通大學 === 電信工程系 === 89 === This study focuses on two issues: dialect identification and spoken message indexing, which are necessary steps to design spoken language systems with the goal of multilingual information access. The first part of this st...

Full description

Bibliographic Details
Main Authors: Wei-Ho Tsai, 蔡偉和
Other Authors: Wen-Whei Chang
Format: Others
Language:en_US
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/40583607492665665191
Description
Summary:博士 === 國立交通大學 === 電信工程系 === 89 === This study focuses on two issues: dialect identification and spoken message indexing, which are necessary steps to design spoken language systems with the goal of multilingual information access. The first part of this study presents three approaches that employ varying degrees of linguistic traits to evaluate their relative contributions towards Chinese dialect identification. The first design approach was based on phonotactic analysis following phonetic tokenization, the second on pitch contour dynamics, and the third on a combination of segmental and prosodic features. The importance of incorporating prosodic information is due to the fact that Chinese syllables may have the same phonetic compositions, but different lexical meanings when spoken with different tones. Simulation results indicate that the proposed composite hidden Markov model is very effective in information integration, and use of this model can discriminate among three major Chinese dialects spoken in Taiwan with 89.3\% accuracy. Also proposed is a new stochastic model, Gaussian mixture bigram model (GMBM), that better characterizes the time correlation on acoustic feature frames. The main attraction of GMBMs arises from the fact that the observation used in dialect-specific modeling are extracted directly from the acoustic features; allowing us to estimate its model parameters without any transcription of training utterances. For greater efficiency, a minimum classification error algorithm is employed to accomplish discriminative training of a GMBM-based dialect identification system. The second part of this study addressed the general task of automatic indexing of spoken messages when no information is available regarding the language. This task was accomplished by partitioning the unlabeled speech messages into segments containing only one language and by grouping acoustically homogeneous segments into one-language clusters. Approaches to language-based segmentation are presented based on GMBM modeling of language acoustics in conjunction with different dissimilarity measurements. When dealing with the language clustering, the merits of using a new scheme based on vector clustering are explored as compared with conventional hierarchical clustering techniques.