Optimizing text-independent speaker recognition using an LSTM neural network

In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten s...

Full description

Bibliographic Details
Main Author: Larsson, Joel
Format: Others
Language:English
Published: Mälardalens högskola, Akademin för innovation, design och teknik 2014
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312
id ndltd-UPSALLA1-oai-DiVA.org-mdh-26312
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-mdh-263122014-10-31T04:56:26ZOptimizing text-independent speaker recognition using an LSTM neural networkengLarsson, JoelMälardalens högskola, Akademin för innovation, design och teknik2014speaker recognitionspeaker identificationtext-independentlong short-term memorylstmmel frequency cepstral coefficientsmfccrecurrent neural networkspeech processingspectral analysisrnnlibhtktoolkitIn this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic speaker recognition
speaker identification
text-independent
long short-term memory
lstm
mel frequency cepstral coefficients
mfcc
recurrent neural network
speech processing
spectral analysis
rnnlib
htktoolkit
spellingShingle speaker recognition
speaker identification
text-independent
long short-term memory
lstm
mel frequency cepstral coefficients
mfcc
recurrent neural network
speech processing
spectral analysis
rnnlib
htktoolkit
Larsson, Joel
Optimizing text-independent speaker recognition using an LSTM neural network
description In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.
author Larsson, Joel
author_facet Larsson, Joel
author_sort Larsson, Joel
title Optimizing text-independent speaker recognition using an LSTM neural network
title_short Optimizing text-independent speaker recognition using an LSTM neural network
title_full Optimizing text-independent speaker recognition using an LSTM neural network
title_fullStr Optimizing text-independent speaker recognition using an LSTM neural network
title_full_unstemmed Optimizing text-independent speaker recognition using an LSTM neural network
title_sort optimizing text-independent speaker recognition using an lstm neural network
publisher Mälardalens högskola, Akademin för innovation, design och teknik
publishDate 2014
url http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312
work_keys_str_mv AT larssonjoel optimizingtextindependentspeakerrecognitionusinganlstmneuralnetwork
_version_ 1716719346016845824