Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in...

Full description

Bibliographic Details
Main Authors: Anvarjon Tursunov, Soonil Kwon, Hee-Suk Pang
Format: Article
Language:English
Published: MDPI AG 2019-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/12/2470
id doaj-adb13f6e96bd41b5ac826d8acfb4470b
record_format Article
spelling doaj-adb13f6e96bd41b5ac826d8acfb4470b2020-11-25T02:22:46ZengMDPI AGApplied Sciences2076-34172019-06-01912247010.3390/app9122470app9122470Discriminating Emotions in the Valence Dimension from Speech Using Timbre FeaturesAnvarjon Tursunov0Soonil Kwon1Hee-Suk Pang2Department of Digital Contents, Sejong University, Seoul 05006, KoreaDepartment of Digital Contents, Sejong University, Seoul 05006, KoreaDepartment of Electrical Engineering, Sejong University, Seoul 05006, KoreaThe most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.https://www.mdpi.com/2076-3417/9/12/2470timbre acoustic featuresvalence dimensionaffective computingemotion recognitionneural networksspeech processing
collection DOAJ
language English
format Article
sources DOAJ
author Anvarjon Tursunov
Soonil Kwon
Hee-Suk Pang
spellingShingle Anvarjon Tursunov
Soonil Kwon
Hee-Suk Pang
Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
Applied Sciences
timbre acoustic features
valence dimension
affective computing
emotion recognition
neural networks
speech processing
author_facet Anvarjon Tursunov
Soonil Kwon
Hee-Suk Pang
author_sort Anvarjon Tursunov
title Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
title_short Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
title_full Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
title_fullStr Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
title_full_unstemmed Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
title_sort discriminating emotions in the valence dimension from speech using timbre features
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2019-06-01
description The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.
topic timbre acoustic features
valence dimension
affective computing
emotion recognition
neural networks
speech processing
url https://www.mdpi.com/2076-3417/9/12/2470
work_keys_str_mv AT anvarjontursunov discriminatingemotionsinthevalencedimensionfromspeechusingtimbrefeatures
AT soonilkwon discriminatingemotionsinthevalencedimensionfromspeechusingtimbrefeatures
AT heesukpang discriminatingemotionsinthevalencedimensionfromspeechusingtimbrefeatures
_version_ 1724861815969546240