Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance...

Full description

Bibliographic Details
Main Authors:	Salinna Abdullah, Majid Zamani, Andreas Demosthenous
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Correlation coefficients deep neural network dynamic noise-aware training quantization speech enhancement training targets
Online Access:	https://ieeexplore.ieee.org/document/9345671/

id	doaj-2cf0bf202dd84fa589ee498b484b460b
record_format	Article
spelling	doaj-2cf0bf202dd84fa589ee498b484b460b2021-03-30T14:56:27ZengIEEEIEEE Access2169-35362021-01-019243502436210.1109/ACCESS.2021.30567119345671Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation MaskSalinna Abdullah0https://orcid.org/0000-0003-0092-3190Majid Zamani1https://orcid.org/0000-0002-8986-757XAndreas Demosthenous2https://orcid.org/0000-0003-0623-963XDepartment of Electronic and Electrical Engineering, University College London (UCL), London, U.KDepartment of Electronic and Electrical Engineering, University College London (UCL), London, U.KDepartment of Electronic and Electrical Engineering, University College London (UCL), London, U.KMany studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compactness, the training and inference time can be reduced by 15.7% and 10.5%, respectively.https://ieeexplore.ieee.org/document/9345671/Correlation coefficientsdeep neural networkdynamic noise-aware trainingquantizationspeech enhancementtraining targets
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Salinna Abdullah Majid Zamani Andreas Demosthenous
spellingShingle	Salinna Abdullah Majid Zamani Andreas Demosthenous Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask IEEE Access Correlation coefficients deep neural network dynamic noise-aware training quantization speech enhancement training targets
author_facet	Salinna Abdullah Majid Zamani Andreas Demosthenous
author_sort	Salinna Abdullah
title	Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
title_short	Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
title_full	Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
title_fullStr	Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
title_full_unstemmed	Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
title_sort	towards more efficient dnn-based speech enhancement using quantized correlation mask
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compactness, the training and inference time can be reduced by 15.7% and 10.5%, respectively.
topic	Correlation coefficients deep neural network dynamic noise-aware training quantization speech enhancement training targets
url	https://ieeexplore.ieee.org/document/9345671/
work_keys_str_mv	AT salinnaabdullah towardsmoreefficientdnnbasedspeechenhancementusingquantizedcorrelationmask AT majidzamani towardsmoreefficientdnnbasedspeechenhancementusingquantizedcorrelationmask AT andreasdemosthenous towardsmoreefficientdnnbasedspeechenhancementusingquantizedcorrelationmask
_version_	1724180312608473088

Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Similar Items