Estimating runtime of a job in Hadoop MapReduce
Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new metho...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2020-07-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-020-00319-4 |
id |
doaj-a16e253c6f9f4bd885e1aa4585dac27d |
---|---|
record_format |
Article |
spelling |
doaj-a16e253c6f9f4bd885e1aa4585dac27d2020-11-25T03:52:13ZengSpringerOpenJournal of Big Data2196-11152020-07-017111810.1186/s40537-020-00319-4Estimating runtime of a job in Hadoop MapReduceNarges Peyravi0Ali Moeini1Department of Computer Engineering and Information Technology, Faculty of Engineering, University of QomDepartment of Algorithms and Computation, School of Engineering Science, College of Engineering, University of TehranAbstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.http://link.springer.com/article/10.1186/s40537-020-00319-4HadoopMapReduceRuntime of a jobEstimating the runtime |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Narges Peyravi Ali Moeini |
spellingShingle |
Narges Peyravi Ali Moeini Estimating runtime of a job in Hadoop MapReduce Journal of Big Data Hadoop MapReduce Runtime of a job Estimating the runtime |
author_facet |
Narges Peyravi Ali Moeini |
author_sort |
Narges Peyravi |
title |
Estimating runtime of a job in Hadoop MapReduce |
title_short |
Estimating runtime of a job in Hadoop MapReduce |
title_full |
Estimating runtime of a job in Hadoop MapReduce |
title_fullStr |
Estimating runtime of a job in Hadoop MapReduce |
title_full_unstemmed |
Estimating runtime of a job in Hadoop MapReduce |
title_sort |
estimating runtime of a job in hadoop mapreduce |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2020-07-01 |
description |
Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed. |
topic |
Hadoop MapReduce Runtime of a job Estimating the runtime |
url |
http://link.springer.com/article/10.1186/s40537-020-00319-4 |
work_keys_str_mv |
AT nargespeyravi estimatingruntimeofajobinhadoopmapreduce AT alimoeini estimatingruntimeofajobinhadoopmapreduce |
_version_ |
1724483596487491584 |