Graph Convolution-Based Deep Clustering for Speech Separation

Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance...

Full description

Bibliographic Details
Main Authors: Shan Qin, Ting Jiang, Sheng Wu, Ning Wang, Xinran Zhao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9076605/
id doaj-b0791a734ea841bba8246434680072b7
record_format Article
spelling doaj-b0791a734ea841bba8246434680072b72021-03-30T01:44:39ZengIEEEIEEE Access2169-35362020-01-018825718258010.1109/ACCESS.2020.29898339076605Graph Convolution-Based Deep Clustering for Speech SeparationShan Qin0https://orcid.org/0000-0002-9985-3163Ting Jiang1https://orcid.org/0000-0003-3598-3804Sheng Wu2https://orcid.org/0000-0002-9947-9968Ning Wang3https://orcid.org/0000-0003-1381-7952Xinran Zhao4https://orcid.org/0000-0002-6977-6822Key Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, VA, USAKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDeep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.https://ieeexplore.ieee.org/document/9076605/Construction of graph-structured datadeep clusteringgraph convolutional filterspeech separation
collection DOAJ
language English
format Article
sources DOAJ
author Shan Qin
Ting Jiang
Sheng Wu
Ning Wang
Xinran Zhao
spellingShingle Shan Qin
Ting Jiang
Sheng Wu
Ning Wang
Xinran Zhao
Graph Convolution-Based Deep Clustering for Speech Separation
IEEE Access
Construction of graph-structured data
deep clustering
graph convolutional filter
speech separation
author_facet Shan Qin
Ting Jiang
Sheng Wu
Ning Wang
Xinran Zhao
author_sort Shan Qin
title Graph Convolution-Based Deep Clustering for Speech Separation
title_short Graph Convolution-Based Deep Clustering for Speech Separation
title_full Graph Convolution-Based Deep Clustering for Speech Separation
title_fullStr Graph Convolution-Based Deep Clustering for Speech Separation
title_full_unstemmed Graph Convolution-Based Deep Clustering for Speech Separation
title_sort graph convolution-based deep clustering for speech separation
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.
topic Construction of graph-structured data
deep clustering
graph convolutional filter
speech separation
url https://ieeexplore.ieee.org/document/9076605/
work_keys_str_mv AT shanqin graphconvolutionbaseddeepclusteringforspeechseparation
AT tingjiang graphconvolutionbaseddeepclusteringforspeechseparation
AT shengwu graphconvolutionbaseddeepclusteringforspeechseparation
AT ningwang graphconvolutionbaseddeepclusteringforspeechseparation
AT xinranzhao graphconvolutionbaseddeepclusteringforspeechseparation
_version_ 1724186472624422912