Window Queries Over Data Streams

Evaluating queries over data streams has become an appealing way to support various stream-processing applications. Window queries are commonly used in many stream applications. In a window query, certain query operators, especially blocking operators and stateful operators, appear in their windowed...

Full description

Bibliographic Details
Main Author: Li, Jin
Format: Others
Published: PDXScholar 2008
Subjects:
Online Access:https://pdxscholar.library.pdx.edu/open_access_etds/2675
https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=3677&context=open_access_etds
id ndltd-pdx.edu-oai-pdxscholar.library.pdx.edu-open_access_etds-3677
record_format oai_dc
spelling ndltd-pdx.edu-oai-pdxscholar.library.pdx.edu-open_access_etds-36772019-10-20T04:46:25Z Window Queries Over Data Streams Li, Jin Evaluating queries over data streams has become an appealing way to support various stream-processing applications. Window queries are commonly used in many stream applications. In a window query, certain query operators, especially blocking operators and stateful operators, appear in their windowed versions. Previous research work in evaluating window queries typically requires ordered streams and this order requirement limits the implementations of window operators and also carries performance penalties. This thesis presents efficient and flexible algorithms for evaluating window queries. We first present a new data model for streams, progressing streams, that separates stream progress from physical-arrival order. Then, we present our window semantic definitions for the most commonly used window operators—window aggregation and window join. Unlike previous research that often requires ordered streams when describing window semantics, our window semantic definitions do not rely on physical-stream arrival properties. Based on the window semantic definitions, we present new implementations of window aggregation and window join, WID and OA-Join. Compared to the existing implementations of stream query operators, our implementations do not require special stream-arrival properties, particularly stream order. In addition, for window aggregation, we present two other implementations extended from WID, Paned-WID and AdaptWID, to improve excution time by sharing sub-aggregates and to improve memory usage for input with data distribution skew, respectively. Leveraging our order-insenstive implementations of window operators, we present a new architecture for stream systems, OOP (Out-of- Order Processing). Instead of relying on ordered streams to indicate stream progress, OOP explicitly communicates stream progress to query operators, and thus is more flexible than the previous in-order processing (IOP) approach, which requires maintaining stream order. We implemented our order-insensitive window query operators and the OOP architecture in NiagaraST and Gigascope. Our performance study in both systems confirms the benefits of our window operator implementations and the OOP architecture compared to the commonly used approaches in terms of memory usage, execution time and latency. 2008-10-01T07:00:00Z text application/pdf https://pdxscholar.library.pdx.edu/open_access_etds/2675 https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=3677&context=open_access_etds Dissertations and Theses PDXScholar Querying (Computer science) Streaming technology (Telecommunications) Electronic data processing Computer Engineering Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic Querying (Computer science)
Streaming technology (Telecommunications)
Electronic data processing
Computer Engineering
Computer Sciences
spellingShingle Querying (Computer science)
Streaming technology (Telecommunications)
Electronic data processing
Computer Engineering
Computer Sciences
Li, Jin
Window Queries Over Data Streams
description Evaluating queries over data streams has become an appealing way to support various stream-processing applications. Window queries are commonly used in many stream applications. In a window query, certain query operators, especially blocking operators and stateful operators, appear in their windowed versions. Previous research work in evaluating window queries typically requires ordered streams and this order requirement limits the implementations of window operators and also carries performance penalties. This thesis presents efficient and flexible algorithms for evaluating window queries. We first present a new data model for streams, progressing streams, that separates stream progress from physical-arrival order. Then, we present our window semantic definitions for the most commonly used window operators—window aggregation and window join. Unlike previous research that often requires ordered streams when describing window semantics, our window semantic definitions do not rely on physical-stream arrival properties. Based on the window semantic definitions, we present new implementations of window aggregation and window join, WID and OA-Join. Compared to the existing implementations of stream query operators, our implementations do not require special stream-arrival properties, particularly stream order. In addition, for window aggregation, we present two other implementations extended from WID, Paned-WID and AdaptWID, to improve excution time by sharing sub-aggregates and to improve memory usage for input with data distribution skew, respectively. Leveraging our order-insenstive implementations of window operators, we present a new architecture for stream systems, OOP (Out-of- Order Processing). Instead of relying on ordered streams to indicate stream progress, OOP explicitly communicates stream progress to query operators, and thus is more flexible than the previous in-order processing (IOP) approach, which requires maintaining stream order. We implemented our order-insensitive window query operators and the OOP architecture in NiagaraST and Gigascope. Our performance study in both systems confirms the benefits of our window operator implementations and the OOP architecture compared to the commonly used approaches in terms of memory usage, execution time and latency.
author Li, Jin
author_facet Li, Jin
author_sort Li, Jin
title Window Queries Over Data Streams
title_short Window Queries Over Data Streams
title_full Window Queries Over Data Streams
title_fullStr Window Queries Over Data Streams
title_full_unstemmed Window Queries Over Data Streams
title_sort window queries over data streams
publisher PDXScholar
publishDate 2008
url https://pdxscholar.library.pdx.edu/open_access_etds/2675
https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=3677&context=open_access_etds
work_keys_str_mv AT lijin windowqueriesoverdatastreams
_version_ 1719271830593732608