Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition

Facial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to...

Full description

Bibliographic Details
Main Author: Qinglan Wei
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9438697/
id doaj-78b81ab47a5345b0b7b5ddb6e63ffae6
record_format Article
spelling doaj-78b81ab47a5345b0b7b5ddb6e63ffae62021-05-27T23:04:25ZengIEEEIEEE Access2169-35362021-01-019762247623410.1109/ACCESS.2021.30826949438697Saliency Maps-Based Convolutional Neural Networks for Facial Expression RecognitionQinglan Wei0https://orcid.org/0000-0002-2710-0410School of Data Science and Intelligent Media, Communication University of China, Beijing, ChinaFacial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to the fact that the CNN network lacks the visual attention guidance, when it gets expression information it brings background noises, resulting in the lower recognition accuracy. In order to simulate the attention mechanism in human visual system, a salient feature extraction model is proposed, including the dilated inception module, the Difference of Gaussian (DOG) module, and the multi-indicator saliency prediction module. This model can effectively reflect the key facial information through the increase of the receptive field, the acquisition of multiscale features, and the simulation of human vision. In addition, a novel FER method for one single person is proposed. With the prior knowledge of saliency maps and the multilayer deep features in the CNN network, the recognition accuracy is improved by obtaining more targeted and more complete deep expression information. The experimental results of saliency prediction, action unit (AU) detection, and smile intensity estimation on the CAT2000, the CK+, and the BP4D databases prove that the proposed method improves the FER performance and is more effective than the existing approaches.https://ieeexplore.ieee.org/document/9438697/Facial expression recognitionsaliency mapsdilated convolutionprior knowledgethe convolutional neural network
collection DOAJ
language English
format Article
sources DOAJ
author Qinglan Wei
spellingShingle Qinglan Wei
Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
IEEE Access
Facial expression recognition
saliency maps
dilated convolution
prior knowledge
the convolutional neural network
author_facet Qinglan Wei
author_sort Qinglan Wei
title Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
title_short Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
title_full Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
title_fullStr Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
title_full_unstemmed Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
title_sort saliency maps-based convolutional neural networks for facial expression recognition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Facial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to the fact that the CNN network lacks the visual attention guidance, when it gets expression information it brings background noises, resulting in the lower recognition accuracy. In order to simulate the attention mechanism in human visual system, a salient feature extraction model is proposed, including the dilated inception module, the Difference of Gaussian (DOG) module, and the multi-indicator saliency prediction module. This model can effectively reflect the key facial information through the increase of the receptive field, the acquisition of multiscale features, and the simulation of human vision. In addition, a novel FER method for one single person is proposed. With the prior knowledge of saliency maps and the multilayer deep features in the CNN network, the recognition accuracy is improved by obtaining more targeted and more complete deep expression information. The experimental results of saliency prediction, action unit (AU) detection, and smile intensity estimation on the CAT2000, the CK+, and the BP4D databases prove that the proposed method improves the FER performance and is more effective than the existing approaches.
topic Facial expression recognition
saliency maps
dilated convolution
prior knowledge
the convolutional neural network
url https://ieeexplore.ieee.org/document/9438697/
work_keys_str_mv AT qinglanwei saliencymapsbasedconvolutionalneuralnetworksforfacialexpressionrecognition
_version_ 1721425109088993280