Summary: | Remote sensing image scene classification is an important means for the understanding of remote sensing images. Convolutional neural networks (CNNs) have been successfully applied to remote sensing image scene classification and have demonstrated remarkable performance. However, with improvements in image resolution, remote sensing image categories are becoming increasingly diverse, and problems such as high intraclass diversity and high interclass similarity have arisen. The performance of ordinary CNNs at distinguishing increasingly complex remote sensing images is still limited. Therefore, we propose a feature fusion framework based on hierarchical attention and bilinear pooling called HABFNet for the scene classification of remote sensing images. First, the deep CNN ResNet50 is used to extract the deep features from different layers of the image, and these features are fused to boost their robustness and effectiveness. Second, we design an improved channel attention scheme to enhance the features from different layers. Finally, the enhanced features are cross-layer bilinearly pooled and fused, and the fused features are used for classification. Extensive experiments were conducted on three publicly available remote sensing image benchmarks. Comparisons with the state-of-the-art methods demonstrated that the proposed HABFNet achieved competitive classification performance.
|