Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals

Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalog...

Full description

Bibliographic Details
Main Authors: Dong Yoon Choi, Deok-Hwan Kim, Byung Cheol Song
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
EEG
Online Access:https://ieeexplore.ieee.org/document/9252925/
id doaj-9ec28fa775794f60831c9172acb75797
record_format Article
spelling doaj-9ec28fa775794f60831c9172acb757972021-03-30T04:34:17ZengIEEEIEEE Access2169-35362020-01-01820381420382610.1109/ACCESS.2020.30368779252925Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG SignalsDong Yoon Choi0https://orcid.org/0000-0003-2990-9691Deok-Hwan Kim1Byung Cheol Song2https://orcid.org/0000-0001-8742-3433Department of Electronic Engineering, Inha University, Incheon, South KoreaDepartment of Electronic Engineering, Inha University, Incheon, South KoreaDepartment of Electronic Engineering, Inha University, Incheon, South KoreaEmotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset.https://ieeexplore.ieee.org/document/9252925/Emotion recognitionvideoEEGmultimodalitymultimodal fusionattention
collection DOAJ
language English
format Article
sources DOAJ
author Dong Yoon Choi
Deok-Hwan Kim
Byung Cheol Song
spellingShingle Dong Yoon Choi
Deok-Hwan Kim
Byung Cheol Song
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
IEEE Access
Emotion recognition
video
EEG
multimodality
multimodal fusion
attention
author_facet Dong Yoon Choi
Deok-Hwan Kim
Byung Cheol Song
author_sort Dong Yoon Choi
title Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
title_short Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
title_full Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
title_fullStr Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
title_full_unstemmed Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
title_sort multimodal attention network for continuous-time emotion recognition using video and eeg signals
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset.
topic Emotion recognition
video
EEG
multimodality
multimodal fusion
attention
url https://ieeexplore.ieee.org/document/9252925/
work_keys_str_mv AT dongyoonchoi multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals
AT deokhwankim multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals
AT byungcheolsong multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals
_version_ 1724181567352340480