Speaker diarization system using HXLPS and deep neural network

In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker si...

Full description

Bibliographic Details
Main Authors:	V. Subba Ramaiah, R. Rajeswara Rao
Format:	Article
Language:	English
Published:	Elsevier 2018-03-01
Series:	Alexandria Engineering Journal
Online Access:	http://www.sciencedirect.com/science/article/pii/S1110016816303416

id	doaj-3644fea7c2974a1caf051cef12ebb7f1
record_format	Article
spelling	doaj-3644fea7c2974a1caf051cef12ebb7f12021-06-02T02:30:16ZengElsevierAlexandria Engineering Journal1110-01682018-03-01571255266Speaker diarization system using HXLPS and deep neural networkV. Subba Ramaiah0R. Rajeswara Rao1Mahatma Gandhi Institute of Technology, Kokapet, Hyderabad, Telangana 500075, India; Corresponding author.JNTUK-UCEV, Kakinada, Andhra Pradesh 535002, IndiaIn general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS) and deep neural network (DNN) is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD) method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM) model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER)http://www.sciencedirect.com/science/article/pii/S1110016816303416
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	V. Subba Ramaiah R. Rajeswara Rao
spellingShingle	V. Subba Ramaiah R. Rajeswara Rao Speaker diarization system using HXLPS and deep neural network Alexandria Engineering Journal
author_facet	V. Subba Ramaiah R. Rajeswara Rao
author_sort	V. Subba Ramaiah
title	Speaker diarization system using HXLPS and deep neural network
title_short	Speaker diarization system using HXLPS and deep neural network
title_full	Speaker diarization system using HXLPS and deep neural network
title_fullStr	Speaker diarization system using HXLPS and deep neural network
title_full_unstemmed	Speaker diarization system using HXLPS and deep neural network
title_sort	speaker diarization system using hxlps and deep neural network
publisher	Elsevier
series	Alexandria Engineering Journal
issn	1110-0168
publishDate	2018-03-01
description	In general, speaker diarization is defined as the process of segmenting the input speech signal and grouped the homogenous regions with regard to the speaker identity. The main idea behind this system is that it is able to discriminate the speaker signal by assigning the label of the each speaker signal. Due to rapid growth of broadcasting and meeting, the speaker diarization is burdensome to enhance the readability of the speech transcription. In order to solve this issue, Holoentropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS) and deep neural network (DNN) is proposed for the speaker diarization system. The HXLPS extraction method is newly developed by incorporating the Holoentropy with the XLPS. Once we attain the features, the speech and non-speech signals are detected by the Voice Activity Detection (VAD) method. Then, i-vector representation of every segmented signal is obtained using Universal Background Model (UBM) model. Consequently, DNN is utilized to assign the label for the speaker signal which is then clustered according to the speaker label. The performance is analysed using the evaluation metrics, such as tracking distance, false alarm rate and diarization error rate. The outcome of the proposed method ensures the better diarization performance by achieving the lower DER of 1.36% based on lambda value and DER of 2.23% depends on the frame length. Keywords: Speaker diarization, HXLPS feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate (DER)
url	http://www.sciencedirect.com/science/article/pii/S1110016816303416
work_keys_str_mv	AT vsubbaramaiah speakerdiarizationsystemusinghxlpsanddeepneuralnetwork AT rrajeswararao speakerdiarizationsystemusinghxlpsanddeepneuralnetwork
_version_	1721409268460027904

Speaker diarization system using HXLPS and deep neural network

Similar Items