Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing

Data streams in the form of potentially unbounded sequences of tuples arise naturally in a large variety of domains including finance markets, sensor networks, social media, and network traffic management. The increasing number of applications that require processing data streams with high throughpu...

Full description

Bibliographic Details
Main Author: Ji, Yuanzhen
Other Authors: Fetzer, Christof
Format: Doctoral Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-233997
https://tud.qucosa.de/id/qucosa%3A29851
https://tud.qucosa.de/api/qucosa%3A29851/attachment/ATT-2/
id ndltd-DRESDEN-oai-qucosa-de-qucosa-29851
record_format oai_dc
spelling ndltd-DRESDEN-oai-qucosa-de-qucosa-298512021-03-28T05:05:55Z Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing urn:nbn:de:bsz:14-qucosa-233997 510444148 eng Data streams in the form of potentially unbounded sequences of tuples arise naturally in a large variety of domains including finance markets, sensor networks, social media, and network traffic management. The increasing number of applications that require processing data streams with high throughput and low latency have promoted the development of data stream processing systems (DSPS). A DSPS processes data streams with continuous queries, which are issued once and return query results to users continuously as new tuples arrive. For stream-based applications, both the query-execution performance (in terms of, e.g., throughput and end-to-end latency) and the quality of produced query results (in terms of, e.g., accuracy and completeness) are important. However, a DSPS often needs to make tradeoffs between these two requirements, either because of the data imperfection within the streams, or because of the limited computation capacity of the DSPS itself. Performance versus result-quality tradeoffs caused by data imperfection are inevitable, because the quality of the incoming data is beyond the control of a DSPS, whereas tradeoffs caused by system limitations can be alleviated—even erased—by enhancing the DSPS itself. This dissertation seeks to advance the state of the art on handling the performance versus result-quality tradeoffs in data stream processing caused by the above two aspects of reasons. For tradeoffs caused by data imperfection, this dissertation focuses on the typical data-imperfection problem of stream disorder and proposes the concept of quality-driven disorder handling (QDDH). QDDH enables a DSPS to make flexible and user-configurable tradeoffs between the end-to-end latency and the query-result quality when dealing with stream disorder. Moreover, compared to existing disorder handling approaches, QDDH can significantly reduce the end-to-end latency, and at the same time provide users with desired query-result quality. In this dissertation, a generic buffer-based QDDH framework and three instantiations of the generic framework for distinct query types are presented. For tradeoffs caused by system limitations, this dissertation proposes a system-enhancement approach that combines the row-oriented and the column-oriented data layout and processing techniques in data stream processing to improve the throughput. To fully exploit the potential of such hybrid execution of continuous queries, a static, cost-based query optimizer is introduced. The optimizer works at the operator level and takes the unique property of execution plans of continuous queries—feasibility—into account. info:eu-repo/classification/ddc/004 ddc:004 Datenstromverarbeitung Data Stream Processing Ji, Yuanzhen Fetzer, Christof Falber, Pascal Technische Universität Dresden 2018-03-27 2017-12-06 2017-11-28 info:eu-repo/semantics/openAccess doc-type:doctoralThesis info:eu-repo/semantics/doctoralThesis doc-type:Text https://tud.qucosa.de/id/qucosa%3A29851 https://tud.qucosa.de/api/qucosa%3A29851/attachment/ATT-2/
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic info:eu-repo/classification/ddc/004
ddc:004
Datenstromverarbeitung
Data Stream Processing
spellingShingle info:eu-repo/classification/ddc/004
ddc:004
Datenstromverarbeitung
Data Stream Processing
Ji, Yuanzhen
Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
description Data streams in the form of potentially unbounded sequences of tuples arise naturally in a large variety of domains including finance markets, sensor networks, social media, and network traffic management. The increasing number of applications that require processing data streams with high throughput and low latency have promoted the development of data stream processing systems (DSPS). A DSPS processes data streams with continuous queries, which are issued once and return query results to users continuously as new tuples arrive. For stream-based applications, both the query-execution performance (in terms of, e.g., throughput and end-to-end latency) and the quality of produced query results (in terms of, e.g., accuracy and completeness) are important. However, a DSPS often needs to make tradeoffs between these two requirements, either because of the data imperfection within the streams, or because of the limited computation capacity of the DSPS itself. Performance versus result-quality tradeoffs caused by data imperfection are inevitable, because the quality of the incoming data is beyond the control of a DSPS, whereas tradeoffs caused by system limitations can be alleviated—even erased—by enhancing the DSPS itself. This dissertation seeks to advance the state of the art on handling the performance versus result-quality tradeoffs in data stream processing caused by the above two aspects of reasons. For tradeoffs caused by data imperfection, this dissertation focuses on the typical data-imperfection problem of stream disorder and proposes the concept of quality-driven disorder handling (QDDH). QDDH enables a DSPS to make flexible and user-configurable tradeoffs between the end-to-end latency and the query-result quality when dealing with stream disorder. Moreover, compared to existing disorder handling approaches, QDDH can significantly reduce the end-to-end latency, and at the same time provide users with desired query-result quality. In this dissertation, a generic buffer-based QDDH framework and three instantiations of the generic framework for distinct query types are presented. For tradeoffs caused by system limitations, this dissertation proposes a system-enhancement approach that combines the row-oriented and the column-oriented data layout and processing techniques in data stream processing to improve the throughput. To fully exploit the potential of such hybrid execution of continuous queries, a static, cost-based query optimizer is introduced. The optimizer works at the operator level and takes the unique property of execution plans of continuous queries—feasibility—into account.
author2 Fetzer, Christof
author_facet Fetzer, Christof
Ji, Yuanzhen
author Ji, Yuanzhen
author_sort Ji, Yuanzhen
title Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
title_short Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
title_full Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
title_fullStr Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
title_full_unstemmed Handling Tradeoffs between Performance and Query-Result Quality in Data Stream Processing
title_sort handling tradeoffs between performance and query-result quality in data stream processing
publishDate 2018
url http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-233997
https://tud.qucosa.de/id/qucosa%3A29851
https://tud.qucosa.de/api/qucosa%3A29851/attachment/ATT-2/
work_keys_str_mv AT jiyuanzhen handlingtradeoffsbetweenperformanceandqueryresultqualityindatastreamprocessing
_version_ 1719386317365706752