Compositional compression of deep image features using stacked quantizers

In computer vision, it is common for image representations to be stored as high-dimensional real-valued vectors. In many computer vision applications, such as retrieval, classification, registration and reconstruction, the computational bottleneck arises in a process known as feature matching, where...

Full description

Bibliographic Details
Main Author: Martinez-Covarrubias, Julieta
Language:English
Published: University of British Columbia 2014
Online Access:http://hdl.handle.net/2429/51511
Description
Summary:In computer vision, it is common for image representations to be stored as high-dimensional real-valued vectors. In many computer vision applications, such as retrieval, classification, registration and reconstruction, the computational bottleneck arises in a process known as feature matching, where, given a query vector, a similarity score has to be computed to many vectors in a (potentially very large) database. For example, it is not uncommon for object retrieval and classification to be performed by matching global representations in collections with thousands or millions of images. A popular approach to reduce the computational and memory requirements of this process is vector quantization. In this work, we first analyze several vector compression methods typically used in the computer vision literature in terms of their computational trade-offs. In particular, we observe that Product Quantization (PQ) and Additive Quantization (AQ) lie on the extremes of a compositional vector compression design choice, where the former assumes complete codebook independence and the latter assumes full codebook dependence. We explore an intermediate approach that exploits a hierarchical structure in the codebooks. This results in a method that is largely competitive with AQ in structured vectors, and outperforms AQ in unstructured vectors while being several orders of magnitude faster. We also perform an extensive evaluation of our method on standard benchmarks of Scale Invariant Feature Transform (SIFT), and GIST descriptors, as well as on new datasets of features obtained from state-of-the-art convolutional neural networks. In benchmarks of low-dimensional deep features, our approach obtains the best known-to-date results, often requiring less than half the memory of PQ to achieve the same performance. === Science, Faculty of === Computer Science, Department of === Graduate