Advanced analysis and join queries in multidimensional spaces

Multidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in m...

Full description

Bibliographic Details
Main Authors: Ge, Shen., 葛屾.
Other Authors: Mamoulis, N
Language:English
Published: The University of Hong Kong (Pokfulam, Hong Kong) 2013
Subjects:
Online Access:http://hdl.handle.net/10722/181500
id ndltd-HKU-oai-hub.hku.hk-10722-181500
record_format oai_dc
spelling ndltd-HKU-oai-hub.hku.hk-10722-1815002015-07-29T04:02:04Z Advanced analysis and join queries in multidimensional spaces Ge, Shen. 葛屾. Mamoulis, N Cheung, DWL Multidimensional databases. Database management. Multidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications. First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost. Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naïvely evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude. Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions. published_or_final_version Computer Science Doctoral Doctor of Philosophy 2013-03-03T03:20:15Z 2013-03-03T03:20:15Z 2013 2012 PG_Thesis 10.5353/th_b4979933 b4979933 http://hdl.handle.net/10722/181500 eng HKU Theses Online (HKUTO) The author retains all proprietary rights, (such as patent rights) and the right to use in future works. Creative Commons: Attribution 3.0 Hong Kong License The University of Hong Kong (Pokfulam, Hong Kong) http://hub.hku.hk/bib/B49799332
collection NDLTD
language English
sources NDLTD
topic Multidimensional databases.
Database management.
spellingShingle Multidimensional databases.
Database management.
Ge, Shen.
葛屾.
Advanced analysis and join queries in multidimensional spaces
description Multidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications. First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost. Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naïvely evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude. Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions. === published_or_final_version === Computer Science === Doctoral === Doctor of Philosophy
author2 Mamoulis, N
author_facet Mamoulis, N
Ge, Shen.
葛屾.
author Ge, Shen.
葛屾.
author_sort Ge, Shen.
title Advanced analysis and join queries in multidimensional spaces
title_short Advanced analysis and join queries in multidimensional spaces
title_full Advanced analysis and join queries in multidimensional spaces
title_fullStr Advanced analysis and join queries in multidimensional spaces
title_full_unstemmed Advanced analysis and join queries in multidimensional spaces
title_sort advanced analysis and join queries in multidimensional spaces
publisher The University of Hong Kong (Pokfulam, Hong Kong)
publishDate 2013
url http://hdl.handle.net/10722/181500
work_keys_str_mv AT geshen advancedanalysisandjoinqueriesinmultidimensionalspaces
AT géshēn advancedanalysisandjoinqueriesinmultidimensionalspaces
_version_ 1716813668055777280