Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
Facial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9438697/ |
id |
doaj-78b81ab47a5345b0b7b5ddb6e63ffae6 |
---|---|
record_format |
Article |
spelling |
doaj-78b81ab47a5345b0b7b5ddb6e63ffae62021-05-27T23:04:25ZengIEEEIEEE Access2169-35362021-01-019762247623410.1109/ACCESS.2021.30826949438697Saliency Maps-Based Convolutional Neural Networks for Facial Expression RecognitionQinglan Wei0https://orcid.org/0000-0002-2710-0410School of Data Science and Intelligent Media, Communication University of China, Beijing, ChinaFacial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to the fact that the CNN network lacks the visual attention guidance, when it gets expression information it brings background noises, resulting in the lower recognition accuracy. In order to simulate the attention mechanism in human visual system, a salient feature extraction model is proposed, including the dilated inception module, the Difference of Gaussian (DOG) module, and the multi-indicator saliency prediction module. This model can effectively reflect the key facial information through the increase of the receptive field, the acquisition of multiscale features, and the simulation of human vision. In addition, a novel FER method for one single person is proposed. With the prior knowledge of saliency maps and the multilayer deep features in the CNN network, the recognition accuracy is improved by obtaining more targeted and more complete deep expression information. The experimental results of saliency prediction, action unit (AU) detection, and smile intensity estimation on the CAT2000, the CK+, and the BP4D databases prove that the proposed method improves the FER performance and is more effective than the existing approaches.https://ieeexplore.ieee.org/document/9438697/Facial expression recognitionsaliency mapsdilated convolutionprior knowledgethe convolutional neural network |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Qinglan Wei |
spellingShingle |
Qinglan Wei Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition IEEE Access Facial expression recognition saliency maps dilated convolution prior knowledge the convolutional neural network |
author_facet |
Qinglan Wei |
author_sort |
Qinglan Wei |
title |
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition |
title_short |
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition |
title_full |
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition |
title_fullStr |
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition |
title_full_unstemmed |
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition |
title_sort |
saliency maps-based convolutional neural networks for facial expression recognition |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Facial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to the fact that the CNN network lacks the visual attention guidance, when it gets expression information it brings background noises, resulting in the lower recognition accuracy. In order to simulate the attention mechanism in human visual system, a salient feature extraction model is proposed, including the dilated inception module, the Difference of Gaussian (DOG) module, and the multi-indicator saliency prediction module. This model can effectively reflect the key facial information through the increase of the receptive field, the acquisition of multiscale features, and the simulation of human vision. In addition, a novel FER method for one single person is proposed. With the prior knowledge of saliency maps and the multilayer deep features in the CNN network, the recognition accuracy is improved by obtaining more targeted and more complete deep expression information. The experimental results of saliency prediction, action unit (AU) detection, and smile intensity estimation on the CAT2000, the CK+, and the BP4D databases prove that the proposed method improves the FER performance and is more effective than the existing approaches. |
topic |
Facial expression recognition saliency maps dilated convolution prior knowledge the convolutional neural network |
url |
https://ieeexplore.ieee.org/document/9438697/ |
work_keys_str_mv |
AT qinglanwei saliencymapsbasedconvolutionalneuralnetworksforfacialexpressionrecognition |
_version_ |
1721425109088993280 |