Summary: | 碩士 === 中華技術學院 === 電子工程研究所碩士班 === 96 === Up to now, the best speaker recognition technique is based on Mel-frequency cepstral coefficients (MFCCs) [1-4,11] method. The main procedures on taking MFCCs are undergone by: framing, Hamming windowing, multiplied by FFT(Fast Fourier Transform)[7], filtered by Mel-scale triangular filter bank, taken the logarithmic energies of outputs, and transformed by DCT (Discrete Cosine Transform)[1-8]. After these processes, the MFCCs are obtained. The main topic of this thesis is we replace previous procedures of FFT [7] and filtering using a frequency-domain Mel-scale triangular filter bank[15] by filtering using a time-domain Mel-scale triangular filter bank.
The time-domain Mel-scale triangular filter bank[1-8,14] we mentioned is obtained by the least square method[10,13], which is used to obtain the Mel-frequency cepstral coefficients of speaker speeches. From the results of our experiments, we find that the successful speaker recognition ratios between the conventional MFCC method[2,3,6,14] and our new approach are very similar.
|