Estimating runtime of a job in Hadoop MapReduce

Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new metho...

Full description

Bibliographic Details
Main Authors:	Narges Peyravi, Ali Moeini
Format:	Article
Language:	English
Published:	SpringerOpen 2020-07-01
Series:	Journal of Big Data
Subjects:	Hadoop MapReduce Runtime of a job Estimating the runtime
Online Access:	http://link.springer.com/article/10.1186/s40537-020-00319-4

id	doaj-a16e253c6f9f4bd885e1aa4585dac27d
record_format	Article
spelling	doaj-a16e253c6f9f4bd885e1aa4585dac27d2020-11-25T03:52:13ZengSpringerOpenJournal of Big Data2196-11152020-07-017111810.1186/s40537-020-00319-4Estimating runtime of a job in Hadoop MapReduceNarges Peyravi0Ali Moeini1Department of Computer Engineering and Information Technology, Faculty of Engineering, University of QomDepartment of Algorithms and Computation, School of Engineering Science, College of Engineering, University of TehranAbstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.http://link.springer.com/article/10.1186/s40537-020-00319-4HadoopMapReduceRuntime of a jobEstimating the runtime
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Narges Peyravi Ali Moeini
spellingShingle	Narges Peyravi Ali Moeini Estimating runtime of a job in Hadoop MapReduce Journal of Big Data Hadoop MapReduce Runtime of a job Estimating the runtime
author_facet	Narges Peyravi Ali Moeini
author_sort	Narges Peyravi
title	Estimating runtime of a job in Hadoop MapReduce
title_short	Estimating runtime of a job in Hadoop MapReduce
title_full	Estimating runtime of a job in Hadoop MapReduce
title_fullStr	Estimating runtime of a job in Hadoop MapReduce
title_full_unstemmed	Estimating runtime of a job in Hadoop MapReduce
title_sort	estimating runtime of a job in hadoop mapreduce
publisher	SpringerOpen
series	Journal of Big Data
issn	2196-1115
publishDate	2020-07-01
description	Abstract Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.
topic	Hadoop MapReduce Runtime of a job Estimating the runtime
url	http://link.springer.com/article/10.1186/s40537-020-00319-4
work_keys_str_mv	AT nargespeyravi estimatingruntimeofajobinhadoopmapreduce AT alimoeini estimatingruntimeofajobinhadoopmapreduce
_version_	1724483596487491584

Estimating runtime of a job in Hadoop MapReduce

Similar Items