Summary: | 博士 === 國立交通大學 === 電信工程系 === 89 === This study focuses on two issues: dialect identification and spoken message
indexing, which are necessary steps to design spoken language systems with
the goal of multilingual information access.
The first part of this study presents three approaches that employ varying
degrees of linguistic traits to evaluate their relative contributions towards Chinese
dialect identification. The first design approach was based on phonotactic
analysis following
phonetic tokenization, the second on pitch contour dynamics, and the third on a
combination of segmental and prosodic features.
The importance of incorporating prosodic information is due to the fact that
Chinese syllables may have the same phonetic compositions, but
different lexical meanings when spoken with different tones.
Simulation results indicate that the proposed composite hidden Markov model is
very effective in information integration,
and use of this model can discriminate
among three major Chinese dialects spoken in Taiwan with 89.3\% accuracy.
Also proposed is a new stochastic model, Gaussian mixture bigram model (GMBM),
that better characterizes the time correlation on acoustic feature frames.
The main attraction of GMBMs arises from the fact
that the observation used in dialect-specific modeling are extracted directly
from the acoustic features; allowing us to estimate its model parameters
without any transcription of training utterances. For greater efficiency, a
minimum classification error algorithm is employed to accomplish discriminative
training of a GMBM-based dialect identification system.
The second part of this study addressed the general task of automatic indexing
of spoken messages when no information is available regarding the language.
This task was accomplished by partitioning the unlabeled speech messages into
segments containing only one language and by grouping acoustically homogeneous
segments into one-language clusters.
Approaches to language-based segmentation are presented based on GMBM modeling
of language acoustics in conjunction with different dissimilarity measurements.
When dealing with the language clustering, the merits of using a new scheme based on
vector clustering
are explored as compared with conventional hierarchical clustering techniques.
|