Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === In recent years, with the development of affective computing, emotion recognition is a critical topic in creating an intelligent human-computer interface. Speech is one of the most efficient ways for human-human communication. Therefore to make machines able...

Full description

Bibliographic Details
Main Authors: Kuan-ChunCheng, 鄭冠群
Other Authors: Chung-Hsien Wu
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/06288055652465276453
id ndltd-TW-101NCKU5392075
record_format oai_dc
spelling ndltd-TW-101NCKU53920752015-10-13T22:51:44Z http://ndltd.ncl.edu.tw/handle/06288055652465276453 Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models 應用多重時間層級單元與階層關聯模型於語音情緒辨識 Kuan-ChunCheng 鄭冠群 碩士 國立成功大學 資訊工程學系碩博士班 101 In recent years, with the development of affective computing, emotion recognition is a critical topic in creating an intelligent human-computer interface. Speech is one of the most efficient ways for human-human communication. Therefore to make machines able to communicate with humans more effectively, understanding the information carried by speech such as emotions and speech intentions is a very important technique. In this thesis, we focused on the technique to detect emotions in speech. This thesis proposed an approach to speech emotion recognition using multi-level temporal information. To achieve this goal, Multi-level Unit Chunking is first employed to segment different temporal levels of emotional units and then the Hierarchical Correlation Model is used to integrate the information from those emotional units. For the Multi-level Unit Chunking, edge detection algorithm is employed to locate the boundaries of change and yield the emotional units automatically. Three types of chunking units will be determined for each utterance: basic unit, sub-emotion unit and emotion unit, within which consistent properties are shown in terms of spectral energy, prosodic feature, and emotion profile, respectively. After locating the units for each level, a Hierarchical Correlation Model is proposed to model the hierarchical utterance structure. For each unit, static features are extracted and converted to emotion profile vectors as its soft-labeling emotion. Single segmentation level models are trained using the emotion profile vectors, which are weighted by the duration of its corresponding unit. To measure the correlation between the units, vector quantization is exploited using k-means clustering algorithm. The quantized vector of each unit is determined by the closest cluster. The correlation is calculated statistically and fused with the results from each single temporal level model. The final decision of the utterance will be determined by choosing the highest score. The proposed approach was evaluated on Berlin Emotional Speech Database (EMO-DB) and the recognition results showed that the proposed speech emotion recognition system achieved 71.69% accuracy, which outperforms previously approaches. After using speaker normalization, the performance reaches 83.55% accuracy in six emotion recognition. Chung-Hsien Wu 吳宗憲 2013 學位論文 ; thesis 58 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === In recent years, with the development of affective computing, emotion recognition is a critical topic in creating an intelligent human-computer interface. Speech is one of the most efficient ways for human-human communication. Therefore to make machines able to communicate with humans more effectively, understanding the information carried by speech such as emotions and speech intentions is a very important technique. In this thesis, we focused on the technique to detect emotions in speech. This thesis proposed an approach to speech emotion recognition using multi-level temporal information. To achieve this goal, Multi-level Unit Chunking is first employed to segment different temporal levels of emotional units and then the Hierarchical Correlation Model is used to integrate the information from those emotional units. For the Multi-level Unit Chunking, edge detection algorithm is employed to locate the boundaries of change and yield the emotional units automatically. Three types of chunking units will be determined for each utterance: basic unit, sub-emotion unit and emotion unit, within which consistent properties are shown in terms of spectral energy, prosodic feature, and emotion profile, respectively. After locating the units for each level, a Hierarchical Correlation Model is proposed to model the hierarchical utterance structure. For each unit, static features are extracted and converted to emotion profile vectors as its soft-labeling emotion. Single segmentation level models are trained using the emotion profile vectors, which are weighted by the duration of its corresponding unit. To measure the correlation between the units, vector quantization is exploited using k-means clustering algorithm. The quantized vector of each unit is determined by the closest cluster. The correlation is calculated statistically and fused with the results from each single temporal level model. The final decision of the utterance will be determined by choosing the highest score. The proposed approach was evaluated on Berlin Emotional Speech Database (EMO-DB) and the recognition results showed that the proposed speech emotion recognition system achieved 71.69% accuracy, which outperforms previously approaches. After using speaker normalization, the performance reaches 83.55% accuracy in six emotion recognition.
author2 Chung-Hsien Wu
author_facet Chung-Hsien Wu
Kuan-ChunCheng
鄭冠群
author Kuan-ChunCheng
鄭冠群
spellingShingle Kuan-ChunCheng
鄭冠群
Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
author_sort Kuan-ChunCheng
title Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
title_short Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
title_full Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
title_fullStr Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
title_full_unstemmed Recognition of Emotions in Speech Using Multi-level Units and Hierarchical Correlation Models
title_sort recognition of emotions in speech using multi-level units and hierarchical correlation models
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/06288055652465276453
work_keys_str_mv AT kuanchuncheng recognitionofemotionsinspeechusingmultilevelunitsandhierarchicalcorrelationmodels
AT zhèngguānqún recognitionofemotionsinspeechusingmultilevelunitsandhierarchicalcorrelationmodels
AT kuanchuncheng yīngyòngduōzhòngshíjiāncéngjídānyuányǔjiēcéngguānliánmóxíngyúyǔyīnqíngxùbiànshí
AT zhèngguānqún yīngyòngduōzhòngshíjiāncéngjídānyuányǔjiēcéngguānliánmóxíngyúyǔyīnqíngxùbiànshí
_version_ 1718081372976840704