Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing appl...

Full description

Bibliographic Details
Main Author: Peiro Sajjad, Hooman
Format: Others
Language:English
Published: KTH, Programvaruteknik och Datorsystem, SCS 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582
http://nbn-resolving.de/urn:isbn:978-91-7729-118-3
id ndltd-UPSALLA1-oai-DiVA.org-kth-193582
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-1935822018-01-15T07:13:06ZTowards Unifying Stream Processing over Central and Near-the-Edge Data CentersengPeiro Sajjad, HoomanKTH, Programvaruteknik och Datorsystem, SCSStockholm2016geo-distributed stream processinggeo-distributed infrastructureedge computingedge-based analyticsComputer and Information SciencesData- och informationsvetenskapIn this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.  First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers. <p>QC 20161005</p>Licentiate thesis, comprehensive summaryinfo:eu-repo/semantics/masterThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582urn:isbn:978-91-7729-118-3TRITA-ICT ; 2016:27application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic geo-distributed stream processing
geo-distributed infrastructure
edge computing
edge-based analytics
Computer and Information Sciences
Data- och informationsvetenskap
spellingShingle geo-distributed stream processing
geo-distributed infrastructure
edge computing
edge-based analytics
Computer and Information Sciences
Data- och informationsvetenskap
Peiro Sajjad, Hooman
Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
description In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.  First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers. === <p>QC 20161005</p>
author Peiro Sajjad, Hooman
author_facet Peiro Sajjad, Hooman
author_sort Peiro Sajjad, Hooman
title Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
title_short Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
title_full Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
title_fullStr Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
title_full_unstemmed Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
title_sort towards unifying stream processing over central and near-the-edge data centers
publisher KTH, Programvaruteknik och Datorsystem, SCS
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582
http://nbn-resolving.de/urn:isbn:978-91-7729-118-3
work_keys_str_mv AT peirosajjadhooman towardsunifyingstreamprocessingovercentralandneartheedgedatacenters
_version_ 1718610555051180032