AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Abstract Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and verb-noun pairs such as “flying insects,” which are underexplored. Therefore, this w...

Full description

Bibliographic Details
Main Authors: Sebastian Säger, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane
Format: Article
Language:English
Published: SpringerOpen 2018-09-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13636-018-0137-5
id doaj-58ff7e5a039f4655b122e82aedbd8bf4
record_format Article
spelling doaj-58ff7e5a039f4655b122e82aedbd8bf42020-11-25T01:32:05ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222018-09-012018111210.1186/s13636-018-0137-5AudioPairBank: towards a large-scale tag-pair-based audio content analysisSebastian Säger0Benjamin Elizalde1Damian Borth2Christian Schulze3Bhiksha Raj4Ian Lane5University of Kaiserslautern, DFKICarnegie Mellon UniversityUniversity of Kaiserslautern, DFKIUniversity of Kaiserslautern, DFKICarnegie Mellon UniversityCarnegie Mellon UniversityAbstract Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and verb-noun pairs such as “flying insects,” which are underexplored. Therefore, this work investigates the relationship between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1123 pairs and over 33,000 audio files. In this paper, we include previously unavailable documentation of the challenges and implications of collecting audio recordings with these types of labels. We have also shown the degree of correlation between the audio content and the labels through classification experiments, which yielded 70% accuracy. The results and study in this paper encourage further exploration of the nuances in sounds and are meant to complement similar research performed on images and text in multimedia analysis.http://link.springer.com/article/10.1186/s13636-018-0137-5Sound event databaseAudio content analysisMachine learningSignal processing
collection DOAJ
language English
format Article
sources DOAJ
author Sebastian Säger
Benjamin Elizalde
Damian Borth
Christian Schulze
Bhiksha Raj
Ian Lane
spellingShingle Sebastian Säger
Benjamin Elizalde
Damian Borth
Christian Schulze
Bhiksha Raj
Ian Lane
AudioPairBank: towards a large-scale tag-pair-based audio content analysis
EURASIP Journal on Audio, Speech, and Music Processing
Sound event database
Audio content analysis
Machine learning
Signal processing
author_facet Sebastian Säger
Benjamin Elizalde
Damian Borth
Christian Schulze
Bhiksha Raj
Ian Lane
author_sort Sebastian Säger
title AudioPairBank: towards a large-scale tag-pair-based audio content analysis
title_short AudioPairBank: towards a large-scale tag-pair-based audio content analysis
title_full AudioPairBank: towards a large-scale tag-pair-based audio content analysis
title_fullStr AudioPairBank: towards a large-scale tag-pair-based audio content analysis
title_full_unstemmed AudioPairBank: towards a large-scale tag-pair-based audio content analysis
title_sort audiopairbank: towards a large-scale tag-pair-based audio content analysis
publisher SpringerOpen
series EURASIP Journal on Audio, Speech, and Music Processing
issn 1687-4722
publishDate 2018-09-01
description Abstract Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and verb-noun pairs such as “flying insects,” which are underexplored. Therefore, this work investigates the relationship between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1123 pairs and over 33,000 audio files. In this paper, we include previously unavailable documentation of the challenges and implications of collecting audio recordings with these types of labels. We have also shown the degree of correlation between the audio content and the labels through classification experiments, which yielded 70% accuracy. The results and study in this paper encourage further exploration of the nuances in sounds and are meant to complement similar research performed on images and text in multimedia analysis.
topic Sound event database
Audio content analysis
Machine learning
Signal processing
url http://link.springer.com/article/10.1186/s13636-018-0137-5
work_keys_str_mv AT sebastiansager audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
AT benjaminelizalde audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
AT damianborth audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
AT christianschulze audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
AT bhiksharaj audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
AT ianlane audiopairbanktowardsalargescaletagpairbasedaudiocontentanalysis
_version_ 1725083346788155392