Summary: | This paper addresses the problem of recognizing specific objects in very large datasets. A common approach has been based on the bag-of-words (BOW) method, in which local image features are clustered into visual words, providing memory
savings through feature quantization. In this paper we take an additional step to reducing memory requirements by selecting only a small subset of the training features to use for recognition. This approach, which we name Robust Feature Selection (RFS), is based on the observation that many local features are unreliable
or represent irrelevant clutter. We are able to select “maximally robust” features by an unsupervised preprocessing step that identifies correctly matching features among the training images. We demonstrate that this selection approach allows an average of 4% of the original features per image to provide matching performance
that is as accurate as the full set in the Oxford Buildings dataset. In addition, we employ a graph to represent the matching relationships between images. Doing so enables us to effectively augment the feature set for each image by merging them with maximally robust features from neighbouring images. We demonstrate adjacent and 2-adjacent augmentation, both of which give a substantial boost in recognition performance.
|