End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments

We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end- to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple ?lter bank, followed by a 2D...

Full description

Bibliographic Details
Main Authors: Pablo Zinemanas, ablo Cancela, Martin Rocamora
Format: Article
Language:English
Published: FRUCT 2019-04-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://fruct.org/publications/fruct24/files/Zin.pdf
id doaj-2b860a11a63642d2a44dc1fcc40f0213
record_format Article
spelling doaj-2b860a11a63642d2a44dc1fcc40f02132020-11-25T02:19:07ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372019-04-0185424533539End-to-end Convolutional Neural Networks for Sound Event Detection in Urban EnvironmentsPablo Zinemanas0ablo Cancela1Martin Rocamora2Universidad de la Republica, Montevideo, UruguayUniversidad de la Republica, Montevideo, UruguayUniversidad de la Republica, Montevideo, UruguayWe present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end- to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple ?lter bank, followed by a 2D CNN for the classi?cation task. The main goal of this two-stage architecture is to bring more interpretability to the ?rst layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel- spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normaliza- tion, PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modi?es the ?lter bank as well as the PCEN normalization parameters. The obtained system achieves classi?cation results that are comparable to the state-of-the-art, while decreasing the number of parameters involved.https://fruct.org/publications/fruct24/files/Zin.pdf Sound event detectionUrban sound environmentsEnd-to-end networksSignal processing
collection DOAJ
language English
format Article
sources DOAJ
author Pablo Zinemanas
ablo Cancela
Martin Rocamora
spellingShingle Pablo Zinemanas
ablo Cancela
Martin Rocamora
End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
Proceedings of the XXth Conference of Open Innovations Association FRUCT
Sound event detection
Urban sound environments
End-to-end networks
Signal processing
author_facet Pablo Zinemanas
ablo Cancela
Martin Rocamora
author_sort Pablo Zinemanas
title End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
title_short End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
title_full End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
title_fullStr End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
title_full_unstemmed End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
title_sort end-to-end convolutional neural networks for sound event detection in urban environments
publisher FRUCT
series Proceedings of the XXth Conference of Open Innovations Association FRUCT
issn 2305-7254
2343-0737
publishDate 2019-04-01
description We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end- to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple ?lter bank, followed by a 2D CNN for the classi?cation task. The main goal of this two-stage architecture is to bring more interpretability to the ?rst layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel- spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normaliza- tion, PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modi?es the ?lter bank as well as the PCEN normalization parameters. The obtained system achieves classi?cation results that are comparable to the state-of-the-art, while decreasing the number of parameters involved.
topic Sound event detection
Urban sound environments
End-to-end networks
Signal processing
url https://fruct.org/publications/fruct24/files/Zin.pdf
work_keys_str_mv AT pablozinemanas endtoendconvolutionalneuralnetworksforsoundeventdetectioninurbanenvironments
AT ablocancela endtoendconvolutionalneuralnetworksforsoundeventdetectioninurbanenvironments
AT martinrocamora endtoendconvolutionalneuralnetworksforsoundeventdetectioninurbanenvironments
_version_ 1724878297291030528