A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Abstract This article proposes a new parallel performance model for different workloads of Spark Big Data applications running on Hadoop clusters. The proposed model can predict the runtime for generic workloads as a function of the number of executors, without necessarily knowing how the algorithms...

Full description

Bibliographic Details
Main Authors: N. Ahmed, Andre L. C. Barczak, Mohammad A. Rashid, Teo Susnjak
Format: Article
Language:English
Published: SpringerOpen 2021-08-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-021-00499-7