Graph Convolution-Based Deep Clustering for Speech Separation

Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance...

Full description

Bibliographic Details
Main Authors:	Shan Qin, Ting Jiang, Sheng Wu, Ning Wang, Xinran Zhao
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Construction of graph-structured data deep clustering graph convolutional filter speech separation
Online Access:	https://ieeexplore.ieee.org/document/9076605/

id	doaj-b0791a734ea841bba8246434680072b7
record_format	Article
spelling	doaj-b0791a734ea841bba8246434680072b72021-03-30T01:44:39ZengIEEEIEEE Access2169-35362020-01-018825718258010.1109/ACCESS.2020.29898339076605Graph Convolution-Based Deep Clustering for Speech SeparationShan Qin0https://orcid.org/0000-0002-9985-3163Ting Jiang1https://orcid.org/0000-0003-3598-3804Sheng Wu2https://orcid.org/0000-0002-9947-9968Ning Wang3https://orcid.org/0000-0003-1381-7952Xinran Zhao4https://orcid.org/0000-0002-6977-6822Key Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, VA, USAKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDeep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.https://ieeexplore.ieee.org/document/9076605/Construction of graph-structured datadeep clusteringgraph convolutional filterspeech separation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao
spellingShingle	Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao Graph Convolution-Based Deep Clustering for Speech Separation IEEE Access Construction of graph-structured data deep clustering graph convolutional filter speech separation
author_facet	Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao
author_sort	Shan Qin
title	Graph Convolution-Based Deep Clustering for Speech Separation
title_short	Graph Convolution-Based Deep Clustering for Speech Separation
title_full	Graph Convolution-Based Deep Clustering for Speech Separation
title_fullStr	Graph Convolution-Based Deep Clustering for Speech Separation
title_full_unstemmed	Graph Convolution-Based Deep Clustering for Speech Separation
title_sort	graph convolution-based deep clustering for speech separation
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.
topic	Construction of graph-structured data deep clustering graph convolutional filter speech separation
url	https://ieeexplore.ieee.org/document/9076605/
work_keys_str_mv	AT shanqin graphconvolutionbaseddeepclusteringforspeechseparation AT tingjiang graphconvolutionbaseddeepclusteringforspeechseparation AT shengwu graphconvolutionbaseddeepclusteringforspeechseparation AT ningwang graphconvolutionbaseddeepclusteringforspeechseparation AT xinranzhao graphconvolutionbaseddeepclusteringforspeechseparation
_version_	1724186472624422912

Graph Convolution-Based Deep Clustering for Speech Separation

Similar Items