An analysis of multi-level filtering for high dimensional image data

Image database systems are very useful in many applications. To design an effective image database system, high dimensional image feature vectors have to be extracted from the images automatically. Each comparison between them tends to be expensive, so sequential comparisons are usually impractic...

Full description

Bibliographic Details
Main Author: Tam, Dominic Pok Man
Language:English
Published: 2009
Online Access:http://hdl.handle.net/2429/4435
Description
Summary:Image database systems are very useful in many applications. To design an effective image database system, high dimensional image feature vectors have to be extracted from the images automatically. Each comparison between them tends to be expensive, so sequential comparisons are usually impractical. Moreover, the. traditional multi-dimensional indexing structures are incapable of handling these high-dimensional vectors efficiently. Thus, it has been proposed to abstract lower dimensional k-D vector from the original N-D feature vector, where k <C N. 2-level filtering is then used so that the k-D vector can fit in the indexing structure for coarse filtering and much fewer comparisons are needed between N-D vectors for the fine filtering stage. Unfortunately, both stages cannot be efficient at the same time. A major contribution of this thesis is to propose the idea of multi-level filtering by adding additional intermediate levels so that both the coarsest and finest filtering stages can be very efficient. Based on the cost models developed, the trends of 2-level and multi-level filterings are analysed and compared. The experimental evaluations further confirm that the 3-level filtering usually requires less CPU and I/O time than 2-level does. When compared with the previous approach of 2-level filtering, 3-levels can save from 15% to over 400% of time needed. Another contribution is to develop the optimizers which can find the (near) optimal configuration of 2-level and 3-level filterings for any image dataset. Experimental results show that in about 32 seconds, the developed optimizer can find a configuration whose total run-time per query exceeds that of the real optimal configuration by less than 2.5%.