PVCLN: Point-View Complementary Learning Network for 3D Shape Recognition

As an important topic in computer vision and multimedia analysis, 3D shape recognition has attracted much research attention in recent years. For point cloud data and multiview data, various approaches have been proposed with remarkable performance. However, few works simultaneously employ the point...

Full description

Bibliographic Details
Main Authors: Shanlin Sun, Yun Li, Minjie Ren, Guo Li, Xing Yao
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9309309/
Description
Summary:As an important topic in computer vision and multimedia analysis, 3D shape recognition has attracted much research attention in recent years. For point cloud data and multiview data, various approaches have been proposed with remarkable performance. However, few works simultaneously employ the point cloud data and multiview data to represent 3D shapes, which is complementary and beneficial in our consideration. Moreover, existing multimodal approaches mainly focus on the multimodal fusion strategy or on exploring the relation between them. However, the intra-modality characteristic information and inter-modality complementary information are ignored in these methods. In this paper, we tackle the above limitations by introducing a novel Point-View Complementary Learning Network (PVCLN) to explore the potential of both the complementary information and characteristic information for 3D shape recognition. Inspired by the successful application of graph neural networks in capturing relations between features, we introduce a novel multimodal fusion strategy. Concretely, we first separately extract the visual feature from multiview data and structural feature from point cloud data. We then project the visual feature and structural feature into the same feature space to learn the complementary information between two modalities by modeling the inter-modality affinities. The characteristic information in each modality is also preserved by considering the intra-modality affinities. The intra-modality and inter-modality affinities compensate for the lacking characteristic information and enhance the complementary information in the feature learning process. Finally, the updated visual and structural features are further combined to achieve a unified representation for a 3D shape. We conduct extensive experiments to validate the superiority of the overall network and the effectiveness of each component. The proposed method is evaluated on the ModelNet40 dataset and the experimental results demonstrate that our framework achieves competitive performance in the 3D shape recognition task.
ISSN:2169-3536