SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information excha...

Full description

Bibliographic Details
Main Authors: Li, H. (Author), Ren, Z. (Author), Wei, W. (Author), Xiao, X. (Author), Yang, H. (Author), Yang, Z. (Author)
Format: Article
Language:English
Published: MDPI 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
Description
Summary:RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs. © 2023 by the authors.
ISBN:22277390 (ISSN)
DOI:10.3390/math11092115