Scheduling in distributed stream processing systems

Stream processing systems receive continuous streams of messages with relatively raw information and produce streams of messages with processed information. The utility of a stream-processing system depends, in part, on the accuracy and timeliness of the output. Streams in complex event processing s...

Full description

Bibliographic Details
Main Author: Khorlin, Andrey
Format: Others
Published: 2006
Online Access:https://thesis.library.caltech.edu/2012/1/thesis.pdf
Khorlin, Andrey (2006) Scheduling in distributed stream processing systems. Master's thesis, California Institute of Technology. doi:10.7907/4MH9-9104. https://resolver.caltech.edu/CaltechETD:etd-05242006-175006 <https://resolver.caltech.edu/CaltechETD:etd-05242006-175006>
Description
Summary:Stream processing systems receive continuous streams of messages with relatively raw information and produce streams of messages with processed information. The utility of a stream-processing system depends, in part, on the accuracy and timeliness of the output. Streams in complex event processing systems are processed on distributed systems; several steps are taken on different processors to process each incoming message, and messages may be enqueued between steps. This work explores the problem of distributed dynamic control of streams to optimize the total utility provided by the system. A system can be controlled using central control or distributed control. In the former case a single central controller maintains the state of the entire system and controls the operation of all processors. In distributed control systems, each processor controls itself based on its state and information from other processors. A challenge of distributed control is that timeliness of output depends only on the total end-to-end time and is otherwise independent of the delays at each separate processor whereas the controller for each processor takes action to control only the steps on that processor and cannot directly control the entire network. In this work, we discuss a framework for design and analysis of the control-based scheduling algorithms for a distributed stream processing system and illustrate our framework with two concrete scheduling algorithms.