Sequence Models for Speech and Music Detection in Radio Broadcast

Speech and Music detection is an important meta-data extraction step for radio broadcasters. It provides them with a good time-stamping of the audio, including parts where speech and music overlap. This task has important applications in royalty collection in broadcast audio for instance, which is t...

Full description

Bibliographic Details
Main Author:	Lemaire, Quentin
Format:	Others
Language:	English
Published:	KTH, Skolan för elektroteknik och datavetenskap (EECS) 2019
Subjects:	Computer and Information Sciences Data- och informationsvetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-251011

id	ndltd-UPSALLA1-oai-DiVA.org-kth-251011
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2510112019-05-16T03:08:22ZSequence Models for Speech and Music Detection in Radio BroadcastengLemaire, QuentinKTH, Skolan för elektroteknik och datavetenskap (EECS)2019Computer and Information SciencesData- och informationsvetenskapSpeech and Music detection is an important meta-data extraction step for radio broadcasters. It provides them with a good time-stamping of the audio, including parts where speech and music overlap. This task has important applications in royalty collection in broadcast audio for instance, which is the use case for this particular study. The study is focused on deep neural network architectures made to process sequential data such as recurrent neural networks or convolutional architectures for sequential learning. Different architectures that have not yet been applied for this task are evaluated and compared with a state-of-the-art architecture (Bidirectional Long Short-Term Memory). Moreover, different strategies to take advantage of both low and high-quality datasets are evaluated. The study shows that Temporal Convolution Network (TCN) architectures can outperform state-of-the-art architectures, and that especially non-causal TCNs lead to a significant improvement in the accuracy. The code used for this study has been made available on GitHub. Taloch musikdetektion är ett viktigt steg för att extrahera metadata för radiobolag. Det ger dem en bra tidsstämpling av ljudet inklusive de delar där tal och musik överlappar varandra. Tillämpningen är viktig vid insamling av royalties för radiosändningar vilket är användningsfallet för den här studien. Studien är inriktad på djupa neurala nätverksarkitekturer, Deep Neural Networks (DNN), gjorda för att behandla sekventiell data som Recurrent Neural Networks (RNN) eller faltningsarkitekturer för sekventiell inlärning. Olika arkitekturer som ännu inte har tillämpats för denna uppgift utvärderas och jämförs med en state-of-the-art-arkitektur (Bidirectional Long Short-Term Memory). Dessutom utvärderas olika strategier för att utnyttja både lågoch högkvalitativa dataset. Studien visar att arkitekturerna för Temporal Convolution Network (TCN) kan överträffa state-of-the-art-arkitekturer, och att speciellt icke-kausala TCN leder till en signifikant förbättring av noggrannheten. Koden som används för denna studie finns tillgänglig på GitHub. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-251011TRITA-EECS-EX ; 2019:86application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer and Information Sciences Data- och informationsvetenskap
spellingShingle	Computer and Information Sciences Data- och informationsvetenskap Lemaire, Quentin Sequence Models for Speech and Music Detection in Radio Broadcast
description	Speech and Music detection is an important meta-data extraction step for radio broadcasters. It provides them with a good time-stamping of the audio, including parts where speech and music overlap. This task has important applications in royalty collection in broadcast audio for instance, which is the use case for this particular study. The study is focused on deep neural network architectures made to process sequential data such as recurrent neural networks or convolutional architectures for sequential learning. Different architectures that have not yet been applied for this task are evaluated and compared with a state-of-the-art architecture (Bidirectional Long Short-Term Memory). Moreover, different strategies to take advantage of both low and high-quality datasets are evaluated. The study shows that Temporal Convolution Network (TCN) architectures can outperform state-of-the-art architectures, and that especially non-causal TCNs lead to a significant improvement in the accuracy. The code used for this study has been made available on GitHub. === Taloch musikdetektion är ett viktigt steg för att extrahera metadata för radiobolag. Det ger dem en bra tidsstämpling av ljudet inklusive de delar där tal och musik överlappar varandra. Tillämpningen är viktig vid insamling av royalties för radiosändningar vilket är användningsfallet för den här studien. Studien är inriktad på djupa neurala nätverksarkitekturer, Deep Neural Networks (DNN), gjorda för att behandla sekventiell data som Recurrent Neural Networks (RNN) eller faltningsarkitekturer för sekventiell inlärning. Olika arkitekturer som ännu inte har tillämpats för denna uppgift utvärderas och jämförs med en state-of-the-art-arkitektur (Bidirectional Long Short-Term Memory). Dessutom utvärderas olika strategier för att utnyttja både lågoch högkvalitativa dataset. Studien visar att arkitekturerna för Temporal Convolution Network (TCN) kan överträffa state-of-the-art-arkitekturer, och att speciellt icke-kausala TCN leder till en signifikant förbättring av noggrannheten. Koden som används för denna studie finns tillgänglig på GitHub.
author	Lemaire, Quentin
author_facet	Lemaire, Quentin
author_sort	Lemaire, Quentin
title	Sequence Models for Speech and Music Detection in Radio Broadcast
title_short	Sequence Models for Speech and Music Detection in Radio Broadcast
title_full	Sequence Models for Speech and Music Detection in Radio Broadcast
title_fullStr	Sequence Models for Speech and Music Detection in Radio Broadcast
title_full_unstemmed	Sequence Models for Speech and Music Detection in Radio Broadcast
title_sort	sequence models for speech and music detection in radio broadcast
publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-251011
work_keys_str_mv	AT lemairequentin sequencemodelsforspeechandmusicdetectioninradiobroadcast
_version_	1719184750315307008

Sequence Models for Speech and Music Detection in Radio Broadcast

Similar Items