The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications
碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the sys...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/gfu59c |
id |
ndltd-TW-107NCTU5428088 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCTU54280882019-06-27T05:42:46Z http://ndltd.ncl.edu.tw/handle/gfu59c The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications ANSI頻率倒譜係數的研究與應用 Yu, Hsin-Hua 余芯樺 碩士 國立交通大學 電子研究所 107 Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the system, the performance of MFCCs will decrease. Therefore, this thesis presents ANSI-Frequency Cepstral Coefficients (AFCCs) and apply our features in the recognition system, which has been widely discussed in the acoustic field, to evaluate the performance. The ANSI filter banks are usually popular in the hearing aids. When the sampling rate of speech is 16 kHz, it can keep 16 filters to replace the Mel-scaled filter banks originally applied in MFCC. Investigated by the noisy circumstances, the simulation results of speaker identification using the TIMIT corpus show the average accuracy gains 22.1% compared with the conventional system using MFCC. In the simulation of speech recognition, we use the isolated digit in TIDIGIT database and 50 words recorded in our lab as test speech samples; the average accuracy also boosts 12.47% and 17.63% in the same additive noisy environments. With noise reduction algorithm, our proposed features, AFCCs, can improve the recognition rate by 9.19% on average than AFCCs without noise reduction. . The proposed AFCC feature extraction algorithm applied in our speech recognizer system has been implemented in TSMC 90 nm CMOS high-VT technology. It can real-time process 16k Hz audio. The chip design is operated by 50 MHz and our speech recognizer system consumes about 1.779 mW (@0.9 V) with clock gating and suitable using in portable device because of low power. The total gate count of the proposed speech recognizer system is about 190k. Liu, Chih-Wei 劉志尉 2019 學位論文 ; thesis 73 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 電子研究所 === 107 === Feature extraction has been the important issue in speech processing for decades. In recent years, more and more research has shown that Mel-Frequency Cepstral Coefficients (MFCCs) outperform other features. However, when the additive noise interferes with the system, the performance of MFCCs will decrease. Therefore, this thesis presents ANSI-Frequency Cepstral Coefficients (AFCCs) and apply our features in the recognition system, which has been widely discussed in the acoustic field, to evaluate the performance. The ANSI filter banks are usually popular in the hearing aids. When the sampling rate of speech is 16 kHz, it can keep 16 filters to replace the Mel-scaled filter banks originally applied in MFCC. Investigated by the noisy circumstances, the simulation results of speaker identification using the TIMIT corpus show the average accuracy gains 22.1% compared with the conventional system using MFCC. In the simulation of speech recognition, we use the isolated digit in TIDIGIT database and 50 words recorded in our lab as test speech samples; the average accuracy also boosts 12.47% and 17.63% in the same additive noisy environments. With noise reduction algorithm, our proposed features, AFCCs, can improve the recognition rate by 9.19% on average than AFCCs without noise reduction.
. The proposed AFCC feature extraction algorithm applied in our speech recognizer system has been implemented in TSMC 90 nm CMOS high-VT technology. It can real-time process 16k Hz audio. The chip design is operated by 50 MHz and our speech recognizer system consumes about 1.779 mW (@0.9 V) with clock gating and suitable using in portable device because of low power. The total gate count of the proposed speech recognizer system is about 190k.
|
author2 |
Liu, Chih-Wei |
author_facet |
Liu, Chih-Wei Yu, Hsin-Hua 余芯樺 |
author |
Yu, Hsin-Hua 余芯樺 |
spellingShingle |
Yu, Hsin-Hua 余芯樺 The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
author_sort |
Yu, Hsin-Hua |
title |
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
title_short |
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
title_full |
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
title_fullStr |
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
title_full_unstemmed |
The Study of ANSI-Frequency Cepstral Coefficients (AFCCs) and Its Applications |
title_sort |
study of ansi-frequency cepstral coefficients (afccs) and its applications |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/gfu59c |
work_keys_str_mv |
AT yuhsinhua thestudyofansifrequencycepstralcoefficientsafccsanditsapplications AT yúxīnhuà thestudyofansifrequencycepstralcoefficientsafccsanditsapplications AT yuhsinhua ansipínlǜdàopǔxìshùdeyánjiūyǔyīngyòng AT yúxīnhuà ansipínlǜdàopǔxìshùdeyánjiūyǔyīngyòng AT yuhsinhua studyofansifrequencycepstralcoefficientsafccsanditsapplications AT yúxīnhuà studyofansifrequencycepstralcoefficientsafccsanditsapplications |
_version_ |
1719213016581406720 |