Summary: | Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methods
|