Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals
Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalog...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9252925/ |
id |
doaj-9ec28fa775794f60831c9172acb75797 |
---|---|
record_format |
Article |
spelling |
doaj-9ec28fa775794f60831c9172acb757972021-03-30T04:34:17ZengIEEEIEEE Access2169-35362020-01-01820381420382610.1109/ACCESS.2020.30368779252925Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG SignalsDong Yoon Choi0https://orcid.org/0000-0003-2990-9691Deok-Hwan Kim1Byung Cheol Song2https://orcid.org/0000-0001-8742-3433Department of Electronic Engineering, Inha University, Incheon, South KoreaDepartment of Electronic Engineering, Inha University, Incheon, South KoreaDepartment of Electronic Engineering, Inha University, Incheon, South KoreaEmotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset.https://ieeexplore.ieee.org/document/9252925/Emotion recognitionvideoEEGmultimodalitymultimodal fusionattention |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dong Yoon Choi Deok-Hwan Kim Byung Cheol Song |
spellingShingle |
Dong Yoon Choi Deok-Hwan Kim Byung Cheol Song Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals IEEE Access Emotion recognition video EEG multimodality multimodal fusion attention |
author_facet |
Dong Yoon Choi Deok-Hwan Kim Byung Cheol Song |
author_sort |
Dong Yoon Choi |
title |
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals |
title_short |
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals |
title_full |
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals |
title_fullStr |
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals |
title_full_unstemmed |
Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals |
title_sort |
multimodal attention network for continuous-time emotion recognition using video and eeg signals |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset. |
topic |
Emotion recognition video EEG multimodality multimodal fusion attention |
url |
https://ieeexplore.ieee.org/document/9252925/ |
work_keys_str_mv |
AT dongyoonchoi multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals AT deokhwankim multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals AT byungcheolsong multimodalattentionnetworkforcontinuoustimeemotionrecognitionusingvideoandeegsignals |
_version_ |
1724181567352340480 |