A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to tr...

Full description

Bibliographic Details
Main Authors:	Abigail Copiaco, Christian Ritz, Nidhal Abdulaziz, Stefano Fasciani
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Applied Sciences
Subjects:	neural network transfer learning scalograms MFCC Log-mel pre-trained models
Online Access:	https://www.mdpi.com/2076-3417/11/11/4880

id	doaj-f8cdd6f06bbc41a1931cf55967f9ce48
record_format	Article
spelling	doaj-f8cdd6f06bbc41a1931cf55967f9ce482021-06-01T01:10:33ZengMDPI AGApplied Sciences2076-34172021-05-01114880488010.3390/app11114880A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio ClassificationAbigail Copiaco0Christian Ritz1Nidhal Abdulaziz2Stefano Fasciani3Faculty of Engineering and Information Sciences, University of Wollongong in Dubai, Dubai 20183, United Arab EmiratesSchool of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Northfields Ave, Wollongong, NSW 2522, AustraliaFaculty of Engineering and Information Sciences, University of Wollongong in Dubai, Dubai 20183, United Arab EmiratesDepartment of Musicology, University of Oslo, Sem Sælands vei 2, 0371 Oslo, NorwayRecent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.https://www.mdpi.com/2076-3417/11/11/4880neural networktransfer learningscalogramsMFCCLog-melpre-trained models
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Abigail Copiaco Christian Ritz Nidhal Abdulaziz Stefano Fasciani
spellingShingle	Abigail Copiaco Christian Ritz Nidhal Abdulaziz Stefano Fasciani A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification Applied Sciences neural network transfer learning scalograms MFCC Log-mel pre-trained models
author_facet	Abigail Copiaco Christian Ritz Nidhal Abdulaziz Stefano Fasciani
author_sort	Abigail Copiaco
title	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
title_short	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
title_full	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
title_fullStr	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
title_full_unstemmed	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
title_sort	study of features and deep neural network architectures and hyper-parameters for domestic audio classification
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2021-05-01
description	Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.
topic	neural network transfer learning scalograms MFCC Log-mel pre-trained models
url	https://www.mdpi.com/2076-3417/11/11/4880
work_keys_str_mv	AT abigailcopiaco astudyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT christianritz astudyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT nidhalabdulaziz astudyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT stefanofasciani astudyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT abigailcopiaco studyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT christianritz studyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT nidhalabdulaziz studyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification AT stefanofasciani studyoffeaturesanddeepneuralnetworkarchitecturesandhyperparametersfordomesticaudioclassification
_version_	1721413040799219712

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Similar Items