Estimating runtime of a job in Hadoop MapReduce

Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new metho...

Full description

Bibliographic Details
Main Authors: Narges Peyravi, Ali Moeini
Format: Article
Language:English
Published: SpringerOpen 2020-07-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-020-00319-4
id doaj-a16e253c6f9f4bd885e1aa4585dac27d
record_format Article
spelling doaj-a16e253c6f9f4bd885e1aa4585dac27d2020-11-25T03:52:13ZengSpringerOpenJournal of Big Data2196-11152020-07-017111810.1186/s40537-020-00319-4Estimating runtime of a job in Hadoop MapReduceNarges Peyravi0Ali Moeini1Department of Computer Engineering and Information Technology, Faculty of Engineering, University of QomDepartment of Algorithms and Computation, School of Engineering Science, College of Engineering, University of TehranAbstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.http://link.springer.com/article/10.1186/s40537-020-00319-4HadoopMapReduceRuntime of a jobEstimating the runtime
collection DOAJ
language English
format Article
sources DOAJ
author Narges Peyravi
Ali Moeini
spellingShingle Narges Peyravi
Ali Moeini
Estimating runtime of a job in Hadoop MapReduce
Journal of Big Data
Hadoop
MapReduce
Runtime of a job
Estimating the runtime
author_facet Narges Peyravi
Ali Moeini
author_sort Narges Peyravi
title Estimating runtime of a job in Hadoop MapReduce
title_short Estimating runtime of a job in Hadoop MapReduce
title_full Estimating runtime of a job in Hadoop MapReduce
title_fullStr Estimating runtime of a job in Hadoop MapReduce
title_full_unstemmed Estimating runtime of a job in Hadoop MapReduce
title_sort estimating runtime of a job in hadoop mapreduce
publisher SpringerOpen
series Journal of Big Data
issn 2196-1115
publishDate 2020-07-01
description Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.
topic Hadoop
MapReduce
Runtime of a job
Estimating the runtime
url http://link.springer.com/article/10.1186/s40537-020-00319-4
work_keys_str_mv AT nargespeyravi estimatingruntimeofajobinhadoopmapreduce
AT alimoeini estimatingruntimeofajobinhadoopmapreduce
_version_ 1724483596487491584