A Study on Top-k Dominance in Metric Space using R-trees

碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 104 === Top-k dominating queries are an important tool for ‘similarity search’ in database and decision support applications. A top-k dominating query returns k data items with the highest dominance in a dataset. It combines the advantages of two powerful preference...

Full description

Bibliographic Details
Main Author: Muzwandile Z. W. Makhubu
Other Authors: 劉傳銘
Format: Others
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/zhnjv6
Description
Summary:碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 104 === Top-k dominating queries are an important tool for ‘similarity search’ in database and decision support applications. A top-k dominating query returns k data items with the highest dominance in a dataset. It combines the advantages of two powerful preference query techniques – the top-k query and the skyline query. It does so while mitigating their individual disadvantages. A great deal of work has been done on solving the top-k dominating query problem on a multivariate dataset where data items are defined as multidimensional points. Most of the work has handled the case where the dataset is static and to a lesser extent distance-based dynamic data. In this work the top-k dominating query is performed over metric space data which poses different challenges to those encountered in the abovementioned multivariate data. In this scenario we have data objects and their distances to a set of input query objects, and these distances can change dynamically as input query objects are generated. Two algorithms are developed to solve the problem of top-k dominating queries in metric space. Typically metric space index structures such as the M-tree would be used in such a situation. In this work we show how to efficiently use R-trees in a metric instead. Moreover the paper also investigates means to reduce the memory footprint of the processing algorithm. This is an important direction as this makes the top-k dominating query solution applicable to wireless broadcast environments where the processing nodes may have limited resource (e.g. wireless sensor networks). We were able to show that the R-tree can be effectively used in indexing data that would typically be indexed using metric space indexes. Moreover the algorithms described are capable of finding the top-k dominating results without first finding the exact dominance score of an object. The two algorithms proposed are called Direct-Top-k Dominating (D-TKD) Query and Enhanced-Top-k Dominating (E-TKD) Query algorithm. The D-TKD algorithm solves the problem without the use of any sophisticated indexing scheme, while the E-TKD employs the R-tree as an index. We show the performance of these two algorithms for different; dataset size, query set size and size of result. We demonstrate the performance improvement that is a result of using the R-tree index for the E-TKD method.