Inferring Speaker Affect in Spoken Natural Language Communication
The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing t...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
Harvard University
2013
|
Subjects: | |
Online Access: | http://dissertations.umi.com/gsas.harvard:10710 http://nrs.harvard.edu/urn-3:HUL.InstRepos:10417532 |
id |
ndltd-harvard.edu-oai-dash.harvard.edu-1-10417532 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-harvard.edu-oai-dash.harvard.edu-1-104175322015-08-14T15:42:03ZInferring Speaker Affect in Spoken Natural Language CommunicationPon-Barry, Heather RobertaComputer scienceLinguisticsPsychologyAffect RecognitionEmotion RecognitionNatural LanguageProsodySpeechThe field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards machine listening—interpreting the full spectrum of information in an audio stream. One part of machine listening, the problem that this thesis focuses on, is the task of using information in the speech signal to infer a person’s emotional or mental state. In this dissertation, our approach is to assess the utility of prosody, or manner of speaking, in classifying speaker affect. Prosody refers to the acoustic features of natural speech: rhythm, stress, intonation, and energy. Affect refers to a person’s emotions and attitudes such as happiness, frustration, or uncertainty. We focus on one specific dimension of affect: level of certainty. Our goal is to automatically infer whether a person is confident or uncertain based on the prosody of his or her speech. Potential applications include conversational dialogue systems (e.g., in educational technology) and voice search (e.g., smartphone personal assistants). There are three main contributions of this thesis. The first contribution is a method for eliciting uncertain speech that binds a speaker’s uncertainty to a single phrase within the larger utterance, allowing us to compare the utility of contextually-based prosodic features. Second, we devise a technique for computing prosodic features from utterance segments that both improves uncertainty classification and can be used to determine which phrase a speaker is uncertain about. The level of certainty classifier achieves an accuracy of 75%. Third, we examine the differences between perceived, self-reported, and internal level of certainty, concluding that perceived certainty is aligned with internal certainty for some but not all speakers and that self-reports are a good proxy for internal certainty.Engineering and Applied SciencesShieber, Stuart M.2013-03-15T17:59:47Z2013-03-1520122013-03-15T17:59:47ZThesis or DissertationPon-Barry, Heather Roberta. 2012. Inferring Speaker Affect in Spoken Natural Language Communication. Doctoral dissertation, Harvard University.http://dissertations.umi.com/gsas.harvard:10710http://nrs.harvard.edu/urn-3:HUL.InstRepos:10417532en_USopenhttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAAHarvard University |
collection |
NDLTD |
language |
en_US |
sources |
NDLTD |
topic |
Computer science Linguistics Psychology Affect Recognition Emotion Recognition Natural Language Prosody Speech |
spellingShingle |
Computer science Linguistics Psychology Affect Recognition Emotion Recognition Natural Language Prosody Speech Pon-Barry, Heather Roberta Inferring Speaker Affect in Spoken Natural Language Communication |
description |
The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards machine listening—interpreting the full spectrum of information in an audio stream. One part of machine listening, the problem that this thesis focuses on, is the task of using information in the speech signal to infer a person’s emotional or mental state. In this dissertation, our approach is to assess the utility of prosody, or manner of speaking, in classifying speaker affect. Prosody refers to the acoustic features of natural speech: rhythm, stress, intonation, and energy. Affect refers to a person’s emotions and attitudes such as happiness, frustration, or uncertainty. We focus on one specific dimension of affect: level of certainty. Our goal is to automatically infer whether a person is confident or uncertain based on the prosody of his or her speech. Potential applications include conversational dialogue systems (e.g., in educational technology) and voice search (e.g., smartphone personal assistants). There are three main contributions of this thesis. The first contribution is a method for eliciting uncertain speech that binds a speaker’s uncertainty to a single phrase within the larger utterance, allowing us to compare the utility of contextually-based prosodic features. Second, we devise a technique for computing prosodic features from utterance segments that both improves uncertainty classification and can be used to determine which phrase a speaker is uncertain about. The level of certainty classifier achieves an accuracy of 75%. Third, we examine the differences between perceived, self-reported, and internal level of certainty, concluding that perceived certainty is aligned with internal certainty for some but not all speakers and that self-reports are a good proxy for internal certainty. === Engineering and Applied Sciences |
author2 |
Shieber, Stuart M. |
author_facet |
Shieber, Stuart M. Pon-Barry, Heather Roberta |
author |
Pon-Barry, Heather Roberta |
author_sort |
Pon-Barry, Heather Roberta |
title |
Inferring Speaker Affect in Spoken Natural Language Communication |
title_short |
Inferring Speaker Affect in Spoken Natural Language Communication |
title_full |
Inferring Speaker Affect in Spoken Natural Language Communication |
title_fullStr |
Inferring Speaker Affect in Spoken Natural Language Communication |
title_full_unstemmed |
Inferring Speaker Affect in Spoken Natural Language Communication |
title_sort |
inferring speaker affect in spoken natural language communication |
publisher |
Harvard University |
publishDate |
2013 |
url |
http://dissertations.umi.com/gsas.harvard:10710 http://nrs.harvard.edu/urn-3:HUL.InstRepos:10417532 |
work_keys_str_mv |
AT ponbarryheatherroberta inferringspeakeraffectinspokennaturallanguagecommunication |
_version_ |
1716816704486506496 |