Best Trade-Off Point Method for Efficient Resource Provisioning in Spark

Considering the recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for more energy-efficient computing. We previously propose...

Full description

Bibliographic Details
Main Author: Peter P. Nghiem
Format: Article
Language:English
Published: MDPI AG 2018-11-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/11/12/190
id doaj-c8afa9777c6c4180ade85387243d5f4d
record_format Article
spelling doaj-c8afa9777c6c4180ade85387243d5f4d2020-11-24T21:35:10ZengMDPI AGAlgorithms1999-48932018-11-01111219010.3390/a11120190a11120190Best Trade-Off Point Method for Efficient Resource Provisioning in SparkPeter P. Nghiem0Department of Computer Engineering, School of Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, USAConsidering the recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for more energy-efficient computing. We previously proposed the Best Trade-off Point (BToP) method, which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce. The BToP method is expected to work for any application or system which relies on a trade-off elbow curve, non-inverted or inverted, for making good decisions. In this paper, we apply the BToP method to the emerging cluster computing framework, Apache Spark, and show that its performance and energy consumption are better than Spark with its built-in dynamic resource allocation enabled. Our Spark-Bench tests confirm the effectiveness of using the BToP method with Spark to determine the optimal number of executors for any workload in production environments where job profiling for behavioral replication will lead to the most efficient resource provisioning.https://www.mdpi.com/1999-4893/11/12/190Apache SparkHadoop MapReduceYARNalgorithm for best trade-off pointoptimizationresource provisioningperformance efficiencyenergy efficiencyelbow curve
collection DOAJ
language English
format Article
sources DOAJ
author Peter P. Nghiem
spellingShingle Peter P. Nghiem
Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
Algorithms
Apache Spark
Hadoop MapReduce
YARN
algorithm for best trade-off point
optimization
resource provisioning
performance efficiency
energy efficiency
elbow curve
author_facet Peter P. Nghiem
author_sort Peter P. Nghiem
title Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
title_short Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
title_full Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
title_fullStr Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
title_full_unstemmed Best Trade-Off Point Method for Efficient Resource Provisioning in Spark
title_sort best trade-off point method for efficient resource provisioning in spark
publisher MDPI AG
series Algorithms
issn 1999-4893
publishDate 2018-11-01
description Considering the recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for more energy-efficient computing. We previously proposed the Best Trade-off Point (BToP) method, which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce. The BToP method is expected to work for any application or system which relies on a trade-off elbow curve, non-inverted or inverted, for making good decisions. In this paper, we apply the BToP method to the emerging cluster computing framework, Apache Spark, and show that its performance and energy consumption are better than Spark with its built-in dynamic resource allocation enabled. Our Spark-Bench tests confirm the effectiveness of using the BToP method with Spark to determine the optimal number of executors for any workload in production environments where job profiling for behavioral replication will lead to the most efficient resource provisioning.
topic Apache Spark
Hadoop MapReduce
YARN
algorithm for best trade-off point
optimization
resource provisioning
performance efficiency
energy efficiency
elbow curve
url https://www.mdpi.com/1999-4893/11/12/190
work_keys_str_mv AT peterpnghiem besttradeoffpointmethodforefficientresourceprovisioninginspark
_version_ 1725946290962956288