Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method

This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. Unlike the other DNN-based methods, which usually train many different models on the same data and then average their predictio...

Full description

Bibliographic Details
Main Authors: Jianfeng Wu, Yongzhu Hua, Shengying Yang, Hongshuai Qin, Huibin Qin
Format: Article
Language:English
Published: MDPI AG 2019-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/16/3396
id doaj-2c7c12c0bafe4fdf869714c7125585cb
record_format Article
spelling doaj-2c7c12c0bafe4fdf869714c7125585cb2020-11-25T00:37:47ZengMDPI AGApplied Sciences2076-34172019-08-01916339610.3390/app9163396app9163396Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical MethodJianfeng Wu0Yongzhu Hua1Shengying Yang2Hongshuai Qin3Huibin Qin4The Institute of Electron Device & Application, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe Institute of Electron Device & Application, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe Institute of Electron Device & Application, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe Institute of Electron Device & Application, Hangzhou Dianzi University, Hangzhou 310018, ChinaThe Institute of Electron Device & Application, Hangzhou Dianzi University, Hangzhou 310018, ChinaThis paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. Unlike the other DNN-based methods, which usually train many different models on the same data and then average their predictions, or use a large number of noise types to enlarge the simulated noisy speech, the proposed method does not train a whole ensemble of models and does not require a mass of simulated noisy speech. It first trains a discriminator network and a generator network simultaneously using the adversarial learning method. Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network. Finally, the generator network is fine-tuned using real noisy speech. Experiments on CHiME4 data sets demonstrate that the proposed method achieves a more robust performance than the compared DNN-based method in terms of perceptual speech quality.https://www.mdpi.com/2076-3417/9/16/3396speech enhancementdeep neural networkgenerative adversarial networkdistill knowledge
collection DOAJ
language English
format Article
sources DOAJ
author Jianfeng Wu
Yongzhu Hua
Shengying Yang
Hongshuai Qin
Huibin Qin
spellingShingle Jianfeng Wu
Yongzhu Hua
Shengying Yang
Hongshuai Qin
Huibin Qin
Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
Applied Sciences
speech enhancement
deep neural network
generative adversarial network
distill knowledge
author_facet Jianfeng Wu
Yongzhu Hua
Shengying Yang
Hongshuai Qin
Huibin Qin
author_sort Jianfeng Wu
title Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
title_short Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
title_full Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
title_fullStr Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
title_full_unstemmed Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
title_sort speech enhancement using generative adversarial network by distilling knowledge from statistical method
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2019-08-01
description This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. Unlike the other DNN-based methods, which usually train many different models on the same data and then average their predictions, or use a large number of noise types to enlarge the simulated noisy speech, the proposed method does not train a whole ensemble of models and does not require a mass of simulated noisy speech. It first trains a discriminator network and a generator network simultaneously using the adversarial learning method. Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network. Finally, the generator network is fine-tuned using real noisy speech. Experiments on CHiME4 data sets demonstrate that the proposed method achieves a more robust performance than the compared DNN-based method in terms of perceptual speech quality.
topic speech enhancement
deep neural network
generative adversarial network
distill knowledge
url https://www.mdpi.com/2076-3417/9/16/3396
work_keys_str_mv AT jianfengwu speechenhancementusinggenerativeadversarialnetworkbydistillingknowledgefromstatisticalmethod
AT yongzhuhua speechenhancementusinggenerativeadversarialnetworkbydistillingknowledgefromstatisticalmethod
AT shengyingyang speechenhancementusinggenerativeadversarialnetworkbydistillingknowledgefromstatisticalmethod
AT hongshuaiqin speechenhancementusinggenerativeadversarialnetworkbydistillingknowledgefromstatisticalmethod
AT huibinqin speechenhancementusinggenerativeadversarialnetworkbydistillingknowledgefromstatisticalmethod
_version_ 1725299552238436352