Summary: | 碩士 === 大同大學 === 資訊工程學系(所) === 95 === In this thesis, the emotion recognition from noisy Mandarin speech is realized. This thesis proposed a useful method that was designed to improve recognition of emotions in Mandarin with different degree of noise. Recognition of emotions in speech is one of the challenges in the field of speech signal processing research. In particular, the selection of a feature set is arguably the most critical part when developing application of this kind. A proper choice of acoustic features can improve the performance of emotional Mandarin recognition system.
To overcome the disturbance of noise, we made our efforts to develop a Mandarin emotion recognition method by means of combining a set of acoustic features using Weighted-Discrete-K-Nearest Neighborhood (Weighted-D-KNN) classifier. In the experiment, Mel-Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), Log Frequency Power Coefficients (LFPC), and Relative Spectral PLP (Rasta-PLP) are selected as the features used in the recognition. Five emotions are investigated, including anger, happiness, sadness, boredom, and neutral.
By using the MFCC, LPCC, Rasta-PLP, and LFPC features, the average recognition accuracy over 80% can be achieved even though the Signal-to-Noise Ratio (SNR) is between 40 dB to 50 dB.
|