Optimizing text-independent speaker recognition using an LSTM neural network
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten s...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Mälardalens högskola, Akademin för innovation, design och teknik
2014
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312 |
id |
ndltd-UPSALLA1-oai-DiVA.org-mdh-26312 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-mdh-263122014-10-31T04:56:26ZOptimizing text-independent speaker recognition using an LSTM neural networkengLarsson, JoelMälardalens högskola, Akademin för innovation, design och teknik2014speaker recognitionspeaker identificationtext-independentlong short-term memorylstmmel frequency cepstral coefficientsmfccrecurrent neural networkspeech processingspectral analysisrnnlibhtktoolkitIn this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
speaker recognition speaker identification text-independent long short-term memory lstm mel frequency cepstral coefficients mfcc recurrent neural network speech processing spectral analysis rnnlib htktoolkit |
spellingShingle |
speaker recognition speaker identification text-independent long short-term memory lstm mel frequency cepstral coefficients mfcc recurrent neural network speech processing spectral analysis rnnlib htktoolkit Larsson, Joel Optimizing text-independent speaker recognition using an LSTM neural network |
description |
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use. |
author |
Larsson, Joel |
author_facet |
Larsson, Joel |
author_sort |
Larsson, Joel |
title |
Optimizing text-independent speaker recognition using an LSTM neural network |
title_short |
Optimizing text-independent speaker recognition using an LSTM neural network |
title_full |
Optimizing text-independent speaker recognition using an LSTM neural network |
title_fullStr |
Optimizing text-independent speaker recognition using an LSTM neural network |
title_full_unstemmed |
Optimizing text-independent speaker recognition using an LSTM neural network |
title_sort |
optimizing text-independent speaker recognition using an lstm neural network |
publisher |
Mälardalens högskola, Akademin för innovation, design och teknik |
publishDate |
2014 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312 |
work_keys_str_mv |
AT larssonjoel optimizingtextindependentspeakerrecognitionusinganlstmneuralnetwork |
_version_ |
1716719346016845824 |