Sliding Window Top-K Monitoring over Distributed Data Streams

Abstract Most of the traditional top-k algorithms are based on a single-server setting. They may be highly inefficient and/or cause huge communication overhead when applied to a distributed system environment. Therefore, the problem of top-k monitoring in distributed environments has been intensivel...

Full description

Bibliographic Details
Main Authors:	Ben Chen, Zhijin Lv, Xiaohui Yu, Yang Liu
Format:	Article
Language:	English
Published:	SpringerOpen 2017-11-01
Series:	Data Science and Engineering
Subjects:	Top-K Distributed monitoring Data stream Stream processing
Online Access:	http://link.springer.com/article/10.1007/s41019-017-0053-1

Description
Summary:	Abstract Most of the traditional top-k algorithms are based on a single-server setting. They may be highly inefficient and/or cause huge communication overhead when applied to a distributed system environment. Therefore, the problem of top-k monitoring in distributed environments has been intensively investigated recently. This paper studies how to monitor the top-k data objects with the largest aggregate numeric values from distributed data streams within a fixed-size monitoring window W, while minimizing communication cost across the network. We propose a novel algorithm, which adaptively reallocates numeric values of data objects among distributed nodes by assigning revision factors when local constraints are violated and keeps the local top-k result at distributed nodes in line with the global top-k result. We also develop a framework that combines a distributed data stream monitoring architecture with a sliding window model. Based on this framework, extensive experiments are conducted on top of Apache Storm to verify the efficiency and scalability of the proposed algorithm.
ISSN:	2364-1185 2364-1541

Sliding Window Top-K Monitoring over Distributed Data Streams

Similar Items