A benchmark of dynamic versus static methods for facial action unit detection

Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provid...

Full description

Bibliographic Details
Main Authors: L. Alharbawee, N. Pugeault
Format: Article
Language:English
Published: Wiley 2021-05-01
Series:The Journal of Engineering
Online Access:https://doi.org/10.1049/tje2.12001
id doaj-e3095058f95743baaab8b1bcc5f13afb
record_format Article
spelling doaj-e3095058f95743baaab8b1bcc5f13afb2021-05-24T15:08:03ZengWileyThe Journal of Engineering2051-33052021-05-012021525226610.1049/tje2.12001A benchmark of dynamic versus static methods for facial action unit detectionL. Alharbawee0N. Pugeault1College of Engineering Mathematics and Physical Sciences University of Exeter Exeter UKCollege of Engineering Mathematics and Physical Sciences University of Exeter Exeter UKAbstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methodshttps://doi.org/10.1049/tje2.12001
collection DOAJ
language English
format Article
sources DOAJ
author L. Alharbawee
N. Pugeault
spellingShingle L. Alharbawee
N. Pugeault
A benchmark of dynamic versus static methods for facial action unit detection
The Journal of Engineering
author_facet L. Alharbawee
N. Pugeault
author_sort L. Alharbawee
title A benchmark of dynamic versus static methods for facial action unit detection
title_short A benchmark of dynamic versus static methods for facial action unit detection
title_full A benchmark of dynamic versus static methods for facial action unit detection
title_fullStr A benchmark of dynamic versus static methods for facial action unit detection
title_full_unstemmed A benchmark of dynamic versus static methods for facial action unit detection
title_sort benchmark of dynamic versus static methods for facial action unit detection
publisher Wiley
series The Journal of Engineering
issn 2051-3305
publishDate 2021-05-01
description Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methods
url https://doi.org/10.1049/tje2.12001
work_keys_str_mv AT lalharbawee abenchmarkofdynamicversusstaticmethodsforfacialactionunitdetection
AT npugeault abenchmarkofdynamicversusstaticmethodsforfacialactionunitdetection
AT lalharbawee benchmarkofdynamicversusstaticmethodsforfacialactionunitdetection
AT npugeault benchmarkofdynamicversusstaticmethodsforfacialactionunitdetection
_version_ 1721428613936447488