Model-Based Diversification for Sequential Exploratory Queries

Abstract Today, data exploration platforms are widely used to assist users in locating interesting objects within large volumes of scientific and business data. In those platforms, users try to make sense of the underlying data space by iteratively posing numerous queries over large databases. While...

Full description

Bibliographic Details
Main Authors:	Hina A. Khan, Mohamed A Sharaf
Format:	Article
Language:	English
Published:	SpringerOpen 2017-03-01
Series:	Data Science and Engineering
Subjects:	Algorithms Design Diversification Performance Query processing
Online Access:	http://link.springer.com/article/10.1007/s41019-017-0038-0

id	doaj-8e72a771790c4d49b9f22e033281f2bc
record_format	Article
spelling	doaj-8e72a771790c4d49b9f22e033281f2bc2021-03-02T05:18:06ZengSpringerOpenData Science and Engineering2364-11852364-15412017-03-012215116810.1007/s41019-017-0038-0Model-Based Diversification for Sequential Exploratory QueriesHina A. Khan0Mohamed A Sharaf1University of QueenslandUniversity of QueenslandAbstract Today, data exploration platforms are widely used to assist users in locating interesting objects within large volumes of scientific and business data. In those platforms, users try to make sense of the underlying data space by iteratively posing numerous queries over large databases. While diversification of query results, like other data summarization techniques, provides users with quick insights into the huge query answer space, it adds additional complexity to an already computationally expensive data exploration task. To address this challenge, in this paper we propose a diversification scheme that targets the problem of efficiently diversifying the results of multiple queries within and across different data exploratory sessions. Our proposed scheme relies on a model-based diversification method and an ordered cache. In particular, we employ an adaptive regression model to estimate the diversity of a diverse subset. Such estimation of diversity value allows us to select diverse results without scanning all the query results. In order to further expedite the diversification process, we propose an order-based caching scheme to leverage the overlap between sequence of data exploration queries. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to the existing methods.http://link.springer.com/article/10.1007/s41019-017-0038-0AlgorithmsDesignDiversificationPerformanceQuery processing
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hina A. Khan Mohamed A Sharaf
spellingShingle	Hina A. Khan Mohamed A Sharaf Model-Based Diversification for Sequential Exploratory Queries Data Science and Engineering Algorithms Design Diversification Performance Query processing
author_facet	Hina A. Khan Mohamed A Sharaf
author_sort	Hina A. Khan
title	Model-Based Diversification for Sequential Exploratory Queries
title_short	Model-Based Diversification for Sequential Exploratory Queries
title_full	Model-Based Diversification for Sequential Exploratory Queries
title_fullStr	Model-Based Diversification for Sequential Exploratory Queries
title_full_unstemmed	Model-Based Diversification for Sequential Exploratory Queries
title_sort	model-based diversification for sequential exploratory queries
publisher	SpringerOpen
series	Data Science and Engineering
issn	2364-1185 2364-1541
publishDate	2017-03-01
description	Abstract Today, data exploration platforms are widely used to assist users in locating interesting objects within large volumes of scientific and business data. In those platforms, users try to make sense of the underlying data space by iteratively posing numerous queries over large databases. While diversification of query results, like other data summarization techniques, provides users with quick insights into the huge query answer space, it adds additional complexity to an already computationally expensive data exploration task. To address this challenge, in this paper we propose a diversification scheme that targets the problem of efficiently diversifying the results of multiple queries within and across different data exploratory sessions. Our proposed scheme relies on a model-based diversification method and an ordered cache. In particular, we employ an adaptive regression model to estimate the diversity of a diverse subset. Such estimation of diversity value allows us to select diverse results without scanning all the query results. In order to further expedite the diversification process, we propose an order-based caching scheme to leverage the overlap between sequence of data exploration queries. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to the existing methods.
topic	Algorithms Design Diversification Performance Query processing
url	http://link.springer.com/article/10.1007/s41019-017-0038-0
work_keys_str_mv	AT hinaakhan modelbaseddiversificationforsequentialexploratoryqueries AT mohamedasharaf modelbaseddiversificationforsequentialexploratoryqueries
_version_	1724242585167331328

Model-Based Diversification for Sequential Exploratory Queries

Similar Items