The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications

碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the sys...

Full description

Bibliographic Details
Main Authors: Yu, Hsin-Hua, 余芯樺
Other Authors: Liu, Chih-Wei
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/gfu59c
id ndltd-TW-107NCTU5428088
record_format oai_dc
spelling ndltd-TW-107NCTU54280882019-06-27T05:42:46Z http://ndltd.ncl.edu.tw/handle/gfu59c The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications ANSI頻率倒譜係數的研究與應用 Yu, Hsin-Hua 余芯樺 碩士 國立交通大學 電子研究所 107 Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the system, the performance of MFCCs will decrease. Therefore, this thesis presents ANSI-Frequency Cepstral Coefficients (AFCCs) and apply our features in the recognition system, which has been widely discussed in the acoustic field, to evaluate the performance. The ANSI filter banks are usually popular in the hearing aids. When the sampling rate of speech is 16 kHz, it can keep 16 filters to replace the Mel-scaled filter banks originally applied in MFCC. Investigated by the noisy circumstances, the simulation results of speaker identification using the TIMIT corpus show the average accuracy gains 22.1% compared with the conventional system using MFCC. In the simulation of speech recognition, we use the isolated digit in TIDIGIT database and 50 words recorded in our lab as test speech samples; the average accuracy also boosts 12.47% and 17.63% in the same additive noisy environments. With noise reduction algorithm, our proposed features, AFCCs, can improve the recognition rate by 9.19% on average than AFCCs without noise reduction. . The proposed AFCC feature extraction algorithm applied in our speech recognizer system has been implemented in TSMC 90 nm CMOS high-VT technology. It can real-time process 16k Hz audio. The chip design is operated by 50 MHz and our speech recognizer system consumes about 1.779 mW (@0.9 V) with clock gating and suitable using in portable device because of low power. The total gate count of the proposed speech recognizer system is about 190k. Liu, Chih-Wei 劉志尉 2019 學位論文 ; thesis 73 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the system, the performance of MFCCs will decrease. Therefore, this thesis presents ANSI-Frequency Cepstral Coefficients (AFCCs) and apply our features in the recognition system, which has been widely discussed in the acoustic field, to evaluate the performance. The ANSI filter banks are usually popular in the hearing aids. When the sampling rate of speech is 16 kHz, it can keep 16 filters to replace the Mel-scaled filter banks originally applied in MFCC. Investigated by the noisy circumstances, the simulation results of speaker identification using the TIMIT corpus show the average accuracy gains 22.1% compared with the conventional system using MFCC. In the simulation of speech recognition, we use the isolated digit in TIDIGIT database and 50 words recorded in our lab as test speech samples; the average accuracy also boosts 12.47% and 17.63% in the same additive noisy environments. With noise reduction algorithm, our proposed features, AFCCs, can improve the recognition rate by 9.19% on average than AFCCs without noise reduction. . The proposed AFCC feature extraction algorithm applied in our speech recognizer system has been implemented in TSMC 90 nm CMOS high-VT technology. It can real-time process 16k Hz audio. The chip design is operated by 50 MHz and our speech recognizer system consumes about 1.779 mW (@0.9 V) with clock gating and suitable using in portable device because of low power. The total gate count of the proposed speech recognizer system is about 190k.
author2 Liu, Chih-Wei
author_facet Liu, Chih-Wei
Yu, Hsin-Hua
余芯樺
author Yu, Hsin-Hua
余芯樺
spellingShingle Yu, Hsin-Hua
余芯樺
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
author_sort Yu, Hsin-Hua
title The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
title_short The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
title_full The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
title_fullStr The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
title_full_unstemmed The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
title_sort study of ansi-frequency cepstral coefficients (afccs) and its applications
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/gfu59c
work_keys_str_mv AT yuhsinhua thestudyofansifrequencycepstralcoefficientsafccsanditsapplications
AT yúxīnhuà thestudyofansifrequencycepstralcoefficientsafccsanditsapplications
AT yuhsinhua ansipínlǜdàopǔxìshùdeyánjiūyǔyīngyòng
AT yúxīnhuà ansipínlǜdàopǔxìshùdeyánjiūyǔyīngyòng
AT yuhsinhua studyofansifrequencycepstralcoefficientsafccsanditsapplications
AT yúxīnhuà studyofansifrequencycepstralcoefficientsafccsanditsapplications
_version_ 1719213016581406720