A benchmark of dynamic versus static methods for facial action unit detection
Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provid...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-05-01
|
Series: | The Journal of Engineering |
Online Access: | https://doi.org/10.1049/tje2.12001 |
id |
doaj-e3095058f95743baaab8b1bcc5f13afb |
---|---|
record_format |
Article |
spelling |
doaj-e3095058f95743baaab8b1bcc5f13afb2021-05-24T15:08:03ZengWileyThe Journal of Engineering2051-33052021-05-012021525226610.1049/tje2.12001A benchmark of dynamic versus static methods for facial action unit detectionL. Alharbawee0N. Pugeault1College of Engineering Mathematics and Physical Sciences University of Exeter Exeter UKCollege of Engineering Mathematics and Physical Sciences University of Exeter Exeter UKAbstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methodshttps://doi.org/10.1049/tje2.12001 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
L. Alharbawee N. Pugeault |
spellingShingle |
L. Alharbawee N. Pugeault A benchmark of dynamic versus static methods for facial action unit detection The Journal of Engineering |
author_facet |
L. Alharbawee N. Pugeault |
author_sort |
L. Alharbawee |
title |
A benchmark of dynamic versus static methods for facial action unit detection |
title_short |
A benchmark of dynamic versus static methods for facial action unit detection |
title_full |
A benchmark of dynamic versus static methods for facial action unit detection |
title_fullStr |
A benchmark of dynamic versus static methods for facial action unit detection |
title_full_unstemmed |
A benchmark of dynamic versus static methods for facial action unit detection |
title_sort |
benchmark of dynamic versus static methods for facial action unit detection |
publisher |
Wiley |
series |
The Journal of Engineering |
issn |
2051-3305 |
publishDate |
2021-05-01 |
description |
Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methods |
url |
https://doi.org/10.1049/tje2.12001 |
work_keys_str_mv |
AT lalharbawee abenchmarkofdynamicversusstaticmethodsforfacialactionunitdetection AT npugeault abenchmarkofdynamicversusstaticmethodsforfacialactionunitdetection AT lalharbawee benchmarkofdynamicversusstaticmethodsforfacialactionunitdetection AT npugeault benchmarkofdynamicversusstaticmethodsforfacialactionunitdetection |
_version_ |
1721428613936447488 |