Scene Semantic Understanding Based on the Spatial Context Relations of Multiple Objects

As a result of the large semantic gap between the low-level features and the high-level semantics, scene understanding is a challenging task for high satellite resolution images. To achieve scene understanding, we need to know the contents of the scene. However, most of the existing scene classifica...

Full description

Bibliographic Details
Main Authors: Yanfei Zhong, Siqi Wu, Bei Zhao
Format: Article
Language:English
Published: MDPI AG 2017-10-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/9/10/1030
Description
Summary:As a result of the large semantic gap between the low-level features and the high-level semantics, scene understanding is a challenging task for high satellite resolution images. To achieve scene understanding, we need to know the contents of the scene. However, most of the existing scene classification methods, such as the bag-of-visual-words model (BoVW), feature coding, topic models, and neural networks, can only classify the scene while ignoring the components and the semantic and spatial relations between these components. Therefore, in this paper, a bottom-up scene understanding framework based on the multi-object spatial context relationship model (MOSCRF) is proposed to combine the co-occurrence relations and position relations at the object level. In MOSCRF, the co-occurrence relation features are modeled by the fisher kernel coding of objects (oFK), while the position relation features are represented by the multi-object force histogram (MOFH). The MOFH is the evolution of the force histogram between pairwise objects. The MOFH not only has the property of being invariant to rotation and mirroring, but also acquires the spatial distribution of the scene by calculating the acting force between multiple land-cover objects. Due to the utilization of the prior knowledge of the objects’ information, MOSCRF can explain the objects and their relations to allow understanding of the scene. The experiments confirm that the proposed MOSCRF can reflect the layout mode of the scene both semantically and spatially, with a higher precision than the traditional methods.
ISSN:2072-4292