Speech Evaluation

碩士 === 國立清華大學 === 資訊工程學系 === 90 === This thesis discusses several methods in speech evaluation, which is a study on computer evaluation of speech contents, fluency and intonation. It requires the techniques from audio signal processing and speech recognition. In order to develop an appropriate and c...

Full description

Bibliographic Details
Main Authors: Chun-Yi Lee, 李俊毅
Other Authors: Jyh-Shing Roger Jang
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/94181476992573057078
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 90 === This thesis discusses several methods in speech evaluation, which is a study on computer evaluation of speech contents, fluency and intonation. It requires the techniques from audio signal processing and speech recognition. In order to develop an appropriate and consistent speech evaluation system, we define several useful speech features for our speech evaluation system and perform several experiments on feature matching methods. There are two parts in this thesis. The first one is “Evaluation using standard speech”, and the other is “Evaluation using HMM and pitch contour”. “Evaluation using standard speech” is a method that evaluates the similarity between a test speech and the corresponding standard speech. We use various approaches for speech feature extraction, pattern matching, and similarity computation. In particular, we use magnitude contour, pitch contour, and mel-frequency cepstral coefficients as the features to generate a similarity score. Magnitude contours represent the variations in volume. Pitch contours represent the variations in pitches. Mel-frequency cepstral coefficients represent the contents of speech.   “Evaluation using HMM and pitch contour” is another speech evaluation paradigm that does not require the existence of a standard speech. Alternatively, we evaluate a test speech based on its similarity to a hidden Markov models (HMM) and tone models. Viterbi decoding is used to segment each character in a continuous sentence. Then the score of each character is computed through the ranking of 411 possible syllables and a tone recognition system.