Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition

Replay speech answer-sheet detection is an urgent problem to be solved for intelligent language learning system. Traditional features used in replay speech detection are often extracted from power spectrum. However, power spectrum may not be the optimal spectrum to extract feature for replay speech...

Full description

Bibliographic Details
Main Authors: Qingzhu Wu, Shaowei Xiong, Zhengyu Zhu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9490247/
id doaj-8ed121bb2ed444cbac57befbb1cfcca8
record_format Article
spelling doaj-8ed121bb2ed444cbac57befbb1cfcca82021-07-29T23:00:30ZengIEEEIEEE Access2169-35362021-01-01910419710420410.1109/ACCESS.2021.30980589490247Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum DecompositionQingzhu Wu0https://orcid.org/0000-0002-6107-0016Shaowei Xiong1Zhengyu Zhu2Guangdong Mechanical and Electrical Polytechnic, Guangzhou, ChinaGuangdong Mechanical and Electrical Polytechnic, Guangzhou, ChinaSchool of Cyberspace Security, Guangdong Polytechnic Normal University, Guangzhou, ChinaReplay speech answer-sheet detection is an urgent problem to be solved for intelligent language learning system. Traditional features used in replay speech detection are often extracted from power spectrum. However, power spectrum may not be the optimal spectrum to extract feature for replay speech answer-sheet detection because it doesn’t consider the characteristic of replay speech. In order to solve this limitation, this paper proposes a method of power spectrum decomposition for replay speech answer-sheet detection on intelligent language learning system. Log frame-wise normalization spectrum (LFNS) and log spectral energy (LSE) which consider the characteristic of replay speech, are obtained by decomposing log power spectrum based on constant-Q transform. Next, the other two features are obtained at the base of LFNS and LSE. The first is constant-Q normalization octave coefficients (CNOC) which is obtained by combining LFNS and octave subband transform. The second is CNOC-LSE that is obtained by combining CNOC and LSE. Then LFNS, CNOC and CNOC-LSE are fed into frame- and utterance-based neural networks. Experimental results show that the proposed LFNS can outperform the conventional log power spectrum, CNOC and CNOC-LSE can perform better than most of commonly used features. We found that utterance-based neural network outperforms frame-based neural network with the same inputs. In addition, handcrafted features give worse performance than corresponding spectrum for the utterance-based neural network while the opposite conclusion can be obtained for the frame-based neural network.https://ieeexplore.ieee.org/document/9490247/Replay speech answer-sheet detectionpower spectrum decompositionlog frame-wise normalization spectrumlog spectral energyintelligent language learning system
collection DOAJ
language English
format Article
sources DOAJ
author Qingzhu Wu
Shaowei Xiong
Zhengyu Zhu
spellingShingle Qingzhu Wu
Shaowei Xiong
Zhengyu Zhu
Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
IEEE Access
Replay speech answer-sheet detection
power spectrum decomposition
log frame-wise normalization spectrum
log spectral energy
intelligent language learning system
author_facet Qingzhu Wu
Shaowei Xiong
Zhengyu Zhu
author_sort Qingzhu Wu
title Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
title_short Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
title_full Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
title_fullStr Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
title_full_unstemmed Replay Speech Answer-Sheet Detection on Intelligent Language Learning System Based on Power Spectrum Decomposition
title_sort replay speech answer-sheet detection on intelligent language learning system based on power spectrum decomposition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Replay speech answer-sheet detection is an urgent problem to be solved for intelligent language learning system. Traditional features used in replay speech detection are often extracted from power spectrum. However, power spectrum may not be the optimal spectrum to extract feature for replay speech answer-sheet detection because it doesn’t consider the characteristic of replay speech. In order to solve this limitation, this paper proposes a method of power spectrum decomposition for replay speech answer-sheet detection on intelligent language learning system. Log frame-wise normalization spectrum (LFNS) and log spectral energy (LSE) which consider the characteristic of replay speech, are obtained by decomposing log power spectrum based on constant-Q transform. Next, the other two features are obtained at the base of LFNS and LSE. The first is constant-Q normalization octave coefficients (CNOC) which is obtained by combining LFNS and octave subband transform. The second is CNOC-LSE that is obtained by combining CNOC and LSE. Then LFNS, CNOC and CNOC-LSE are fed into frame- and utterance-based neural networks. Experimental results show that the proposed LFNS can outperform the conventional log power spectrum, CNOC and CNOC-LSE can perform better than most of commonly used features. We found that utterance-based neural network outperforms frame-based neural network with the same inputs. In addition, handcrafted features give worse performance than corresponding spectrum for the utterance-based neural network while the opposite conclusion can be obtained for the frame-based neural network.
topic Replay speech answer-sheet detection
power spectrum decomposition
log frame-wise normalization spectrum
log spectral energy
intelligent language learning system
url https://ieeexplore.ieee.org/document/9490247/
work_keys_str_mv AT qingzhuwu replayspeechanswersheetdetectiononintelligentlanguagelearningsystembasedonpowerspectrumdecomposition
AT shaoweixiong replayspeechanswersheetdetectiononintelligentlanguagelearningsystembasedonpowerspectrumdecomposition
AT zhengyuzhu replayspeechanswersheetdetectiononintelligentlanguagelearningsystembasedonpowerspectrumdecomposition
_version_ 1721247997006708736