Semantic Labeling of Nonspeech Audio Clips

Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the communicative power of auditory nonlinguistic representations. We created a collection of short...

Full description

Bibliographic Details
Main Authors: Xiaojuan Ma, Christiane Fellbaum, Perry Cook
Format: Article
Language:English
Published: SpringerOpen 2010-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://dx.doi.org/10.1155/2010/404860
id doaj-3dde130f1b534efd9812120d4a4cf909
record_format Article
spelling doaj-3dde130f1b534efd9812120d4a4cf9092020-11-25T01:37:43ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222010-01-01201010.1155/2010/404860Semantic Labeling of Nonspeech Audio ClipsXiaojuan MaChristiane FellbaumPerry CookHuman communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the communicative power of auditory nonlinguistic representations. We created a collection of short nonlinguistic auditory clips encoding familiar human activities, objects, animals, natural phenomena, machinery, and social scenes. We presented these sounds to a broad spectrum of anonymous human workers using Amazon Mechanical Turk and collected verbal sound labels. We analyzed the human labels in terms of their lexical and semantic properties to ascertain that the audio clips do evoke the information suggested by their pre-defined captions. We then measured the agreement with the semantically compatible labels for each sound clip. Finally, we examined which kinds of entities and events, when captured by nonlinguistic acoustic clips, appear to be well-suited to elicit information for communication, and which ones are less discriminable. Our work is set against the broader goal of creating resources that facilitate communication for people with some types of language loss. Furthermore, our data should prove useful for future research in machine analysis/synthesis of audio, such as computational auditory scene analysis, and annotating/querying large collections of sound effects. http://dx.doi.org/10.1155/2010/404860
collection DOAJ
language English
format Article
sources DOAJ
author Xiaojuan Ma
Christiane Fellbaum
Perry Cook
spellingShingle Xiaojuan Ma
Christiane Fellbaum
Perry Cook
Semantic Labeling of Nonspeech Audio Clips
EURASIP Journal on Audio, Speech, and Music Processing
author_facet Xiaojuan Ma
Christiane Fellbaum
Perry Cook
author_sort Xiaojuan Ma
title Semantic Labeling of Nonspeech Audio Clips
title_short Semantic Labeling of Nonspeech Audio Clips
title_full Semantic Labeling of Nonspeech Audio Clips
title_fullStr Semantic Labeling of Nonspeech Audio Clips
title_full_unstemmed Semantic Labeling of Nonspeech Audio Clips
title_sort semantic labeling of nonspeech audio clips
publisher SpringerOpen
series EURASIP Journal on Audio, Speech, and Music Processing
issn 1687-4714
1687-4722
publishDate 2010-01-01
description Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the communicative power of auditory nonlinguistic representations. We created a collection of short nonlinguistic auditory clips encoding familiar human activities, objects, animals, natural phenomena, machinery, and social scenes. We presented these sounds to a broad spectrum of anonymous human workers using Amazon Mechanical Turk and collected verbal sound labels. We analyzed the human labels in terms of their lexical and semantic properties to ascertain that the audio clips do evoke the information suggested by their pre-defined captions. We then measured the agreement with the semantically compatible labels for each sound clip. Finally, we examined which kinds of entities and events, when captured by nonlinguistic acoustic clips, appear to be well-suited to elicit information for communication, and which ones are less discriminable. Our work is set against the broader goal of creating resources that facilitate communication for people with some types of language loss. Furthermore, our data should prove useful for future research in machine analysis/synthesis of audio, such as computational auditory scene analysis, and annotating/querying large collections of sound effects.
url http://dx.doi.org/10.1155/2010/404860
work_keys_str_mv AT xiaojuanma semanticlabelingofnonspeechaudioclips
AT christianefellbaum semanticlabelingofnonspeechaudioclips
AT perrycook semanticlabelingofnonspeechaudioclips
_version_ 1725057858128576512