Acoustic Scene Classification With Squeeze-Excitation Residual Networks

Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques a...

Full description

Bibliographic Details
Main Authors:	Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello, Maximo Cobos
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Acoustic scene classification deep learning machine listening pattern recognition squeeze-excitation
Online Access:	https://ieeexplore.ieee.org/document/9118879/

id	doaj-8e621145155447abae89bd02fd06a8d0
record_format	Article
spelling	doaj-8e621145155447abae89bd02fd06a8d02021-03-30T01:56:04ZengIEEEIEEE Access2169-35362020-01-01811228711229610.1109/ACCESS.2020.30027619118879Acoustic Scene Classification With Squeeze-Excitation Residual NetworksJavier Naranjo-Alcazar0https://orcid.org/0000-0001-7503-1272Sergi Perez-Castanos1Pedro Zuccarello2Maximo Cobos3Visualfy, Benisanó, SpainVisualfy, Benisanó, SpainVisualfy, Benisanó, SpainComputer Science Department, Universitat de Valencia, Burjassot, SpainAcoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently instead of jointly as standard CNNs do. This is usually achieved by combining some global grouping operators, linear operators and a final calibration between the input of the block and its learned relationships. The behavior of the block that implements such operators and, therefore, the entire neural network, can be modified depending on the input to the block, the established residual configurations and the selected non-linear activations. The analysis has been carried out using the TAU Urban Acoustic Scenes 2019 dataset presented in the 2019 edition of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. All configurations discussed in this document exceed the performance of the baseline proposed by the DCASE organization by 13% percentage points. In turn, the novel configurations proposed in this paper outperform the residual configurations proposed in previous works.https://ieeexplore.ieee.org/document/9118879/Acoustic scene classificationdeep learningmachine listeningpattern recognitionsqueeze-excitation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Javier Naranjo-Alcazar Sergi Perez-Castanos Pedro Zuccarello Maximo Cobos
spellingShingle	Javier Naranjo-Alcazar Sergi Perez-Castanos Pedro Zuccarello Maximo Cobos Acoustic Scene Classification With Squeeze-Excitation Residual Networks IEEE Access Acoustic scene classification deep learning machine listening pattern recognition squeeze-excitation
author_facet	Javier Naranjo-Alcazar Sergi Perez-Castanos Pedro Zuccarello Maximo Cobos
author_sort	Javier Naranjo-Alcazar
title	Acoustic Scene Classification With Squeeze-Excitation Residual Networks
title_short	Acoustic Scene Classification With Squeeze-Excitation Residual Networks
title_full	Acoustic Scene Classification With Squeeze-Excitation Residual Networks
title_fullStr	Acoustic Scene Classification With Squeeze-Excitation Residual Networks
title_full_unstemmed	Acoustic Scene Classification With Squeeze-Excitation Residual Networks
title_sort	acoustic scene classification with squeeze-excitation residual networks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently instead of jointly as standard CNNs do. This is usually achieved by combining some global grouping operators, linear operators and a final calibration between the input of the block and its learned relationships. The behavior of the block that implements such operators and, therefore, the entire neural network, can be modified depending on the input to the block, the established residual configurations and the selected non-linear activations. The analysis has been carried out using the TAU Urban Acoustic Scenes 2019 dataset presented in the 2019 edition of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. All configurations discussed in this document exceed the performance of the baseline proposed by the DCASE organization by 13% percentage points. In turn, the novel configurations proposed in this paper outperform the residual configurations proposed in previous works.
topic	Acoustic scene classification deep learning machine listening pattern recognition squeeze-excitation
url	https://ieeexplore.ieee.org/document/9118879/
work_keys_str_mv	AT javiernaranjoalcazar acousticsceneclassificationwithsqueezeexcitationresidualnetworks AT sergiperezcastanos acousticsceneclassificationwithsqueezeexcitationresidualnetworks AT pedrozuccarello acousticsceneclassificationwithsqueezeexcitationresidualnetworks AT maximocobos acousticsceneclassificationwithsqueezeexcitationresidualnetworks
_version_	1724186132534525952

Acoustic Scene Classification With Squeeze-Excitation Residual Networks

Similar Items