Summary: | Learning based hashing has been widely used in approximate nearest neighbor search for image retrieval. However, most of the existing hashing methods are designed to learn only simplex feature similarity while ignored the location similarity among multiple objects, thus cannot work well on multi-label image retrieval tasks. In this paper, we propose a novel supervised hashing method which fusions the two kinds of similarities together. First, we leverage an adjacency matrix to record the relative location relationship among multiple objects. Second, by incorporating matrix discretization difference and image label difference, we re-define the pairwise image similarity in a more meticulous way. Third, to learn more distinguishable hash codes, we leverage an attention sub-network to identify the approximate regions of the objects in an image so that the extracted features can mainly focus on the foreground objects and ignore the background clutter. The loss function in our method consists of a multi-categories classification loss which is used to learn the attention sub-network and a hash loss with a scaled sigmoid function which is used to learn the efficient hash codes. Experiment results show that our proposed method is effective in preserving high-level similarities and outperforms the baseline methods in multi-label image retrieval.
|