Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments

Reproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many config...

Full description

Bibliographic Details
Main Author: Perera, Shelan
Format: Others
Language:English
Published: KTH, Skolan för informations- och kommunikationsteknik (ICT) 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-194209
id ndltd-UPSALLA1-oai-DiVA.org-kth-194209
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-1942092017-04-25T05:42:41ZEfficient and Cost-effective Workflow Based on Containers for Distributed Reproducible ExperimentsengPerera, ShelanKTH, Skolan för informations- och kommunikationsteknik (ICT)2016dockerorchestrationcontainerworkflowcloudreproducible-experimentsComputer SystemsDatorsystemReproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many configurations and manual orchestrations. Further, researchers need to allocate a larger amount of money for clusters of machines and then spend their valuable time to perform those experiments. Also, some of the researchers spend a lot of time to validate a distributed scenario in a real environment as most of the pseudo distributed systems do not provide the characteristics of a real distributed system. Karamel provides solutions for the inconvenience caused by the manual orchestration by providing a comprehensive orchestration platform to deploy and run distributed experiments. But still, this solution may incur a similar amount of expenses as of a manual distributed setup since it uses virtual machines underneath. Further, it does not provide quick validations of a distributed setup with a quick feedback loop, as it takes considerable time to terminate and provision new virtual machines. Therefore, we provide a solution by integrating Docker that can co-exists with virtual machine based deployment model seamlessly. Our solution encapsulates the container-based deployment model for users to reproduce distributed experiment in a cost-effective and efficient manner. In this project, we introduce novel deployment model with containers that is not possible with the conventional virtual machine based deployment model. Further, we evaluate our solution with a real deployment of Apache Hadoop Terasort experiment which is a benchmark for Apache Hadoop map-reduce platform in order to explain how this model can be used to save the cost and improve the efficiency.  Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-194209TRITA-ICT-EX ; 2016:125application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic docker
orchestration
container
workflow
cloud
reproducible-experiments
Computer Systems
Datorsystem
spellingShingle docker
orchestration
container
workflow
cloud
reproducible-experiments
Computer Systems
Datorsystem
Perera, Shelan
Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
description Reproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many configurations and manual orchestrations. Further, researchers need to allocate a larger amount of money for clusters of machines and then spend their valuable time to perform those experiments. Also, some of the researchers spend a lot of time to validate a distributed scenario in a real environment as most of the pseudo distributed systems do not provide the characteristics of a real distributed system. Karamel provides solutions for the inconvenience caused by the manual orchestration by providing a comprehensive orchestration platform to deploy and run distributed experiments. But still, this solution may incur a similar amount of expenses as of a manual distributed setup since it uses virtual machines underneath. Further, it does not provide quick validations of a distributed setup with a quick feedback loop, as it takes considerable time to terminate and provision new virtual machines. Therefore, we provide a solution by integrating Docker that can co-exists with virtual machine based deployment model seamlessly. Our solution encapsulates the container-based deployment model for users to reproduce distributed experiment in a cost-effective and efficient manner. In this project, we introduce novel deployment model with containers that is not possible with the conventional virtual machine based deployment model. Further, we evaluate our solution with a real deployment of Apache Hadoop Terasort experiment which is a benchmark for Apache Hadoop map-reduce platform in order to explain how this model can be used to save the cost and improve the efficiency. 
author Perera, Shelan
author_facet Perera, Shelan
author_sort Perera, Shelan
title Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
title_short Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
title_full Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
title_fullStr Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
title_full_unstemmed Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments
title_sort efficient and cost-effective workflow based on containers for distributed reproducible experiments
publisher KTH, Skolan för informations- och kommunikationsteknik (ICT)
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-194209
work_keys_str_mv AT pererashelan efficientandcosteffectiveworkflowbasedoncontainersfordistributedreproducibleexperiments
_version_ 1718444696757338112