Summary: | 碩士 === 國立臺灣大學 === 電機工程學研究所 === 98 === In the study of speaker identification, timbre is often used as the characteristics of speakers. Timbre is the primary auditory feature that human verify the identities of speakers, and it is hidden inside harmonic components of a sound wave. Therefore, most of extracting speaker’s speech characteristics focus on the feature of frequency domain in the literature. Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC) are common methods of feature extraction, but their original purpose is the parameters of speech recognition so that the parameters vary with speech content, and limits identification performance. Thus, this thesis extend the idea [5] of the consistency of human voice to develop a method of feature extraction, and the method can find consistent feature vectors no matter what a speaker says.
In this thesis, it is divided into two parts. First, using the idea that speaker features exist in high frequency bands promotes the consistency of feature vectors of describing timbre difference of two speakers. Second, the method of literature [5] is modified to investigate the consistency of individual timbre. In the second part, we use vocal tract model to obtain frequency responses of speech, and then use 22-order polynomial curve fitting to fit the frequency responses. Subsequently, normalized 23 coefficients are considered a 23-dimensional feature vector, and find that the feature vectors also have consistency. Finally, this method of feature extraction is used to perform the speaker identification, and achieve a good performance.
|