The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications

碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the sys...

Full description

Bibliographic Details
Main Authors: Yu, Hsin-Hua, 余芯樺
Other Authors: Liu, Chih-Wei
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/gfu59c
Description
Summary:碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the system, the performance of MFCCs will decrease. Therefore, this thesis presents ANSI-Frequency Cepstral Coefficients (AFCCs) and apply our features in the recognition system, which has been widely discussed in the acoustic field, to evaluate the performance. The ANSI filter banks are usually popular in the hearing aids. When the sampling rate of speech is 16 kHz, it can keep 16 filters to replace the Mel-scaled filter banks originally applied in MFCC. Investigated by the noisy circumstances, the simulation results of speaker identification using the TIMIT corpus show the average accuracy gains 22.1% compared with the conventional system using MFCC. In the simulation of speech recognition, we use the isolated digit in TIDIGIT database and 50 words recorded in our lab as test speech samples; the average accuracy also boosts 12.47% and 17.63% in the same additive noisy environments. With noise reduction algorithm, our proposed features, AFCCs, can improve the recognition rate by 9.19% on average than AFCCs without noise reduction. . The proposed AFCC feature extraction algorithm applied in our speech recognizer system has been implemented in TSMC 90 nm CMOS high-VT technology. It can real-time process 16k Hz audio. The chip design is operated by 50 MHz and our speech recognizer system consumes about 1.779 mW (@0.9 V) with clock gating and suitable using in portable device because of low power. The total gate count of the proposed speech recognizer system is about 190k.