Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interop...

Full description

Bibliographic Details
Main Authors:	Silvina Caino-Lores, Jesus Carretero, Bogdan Nicolae, Orcun Yildiz, Tom Peterka
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Big data analytics high performance computing spark DIY MPI
Online Access:	https://ieeexplore.ieee.org/document/8884083/

id	doaj-754198e84bc744878bf41e35187dbef7
record_format	Article
spelling	doaj-754198e84bc744878bf41e35187dbef72021-03-30T00:20:23ZengIEEEIEEE Access2169-35362019-01-01715692915695510.1109/ACCESS.2019.29498368884083Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIYSilvina Caino-Lores0https://orcid.org/0000-0002-6922-0138Jesus Carretero1Bogdan Nicolae2Orcun Yildiz3Tom Peterka4Department of Computer Science and Engineering, Computer Architecture and Technology Area (ARCOS), University Carlos III of Madrid, Leganés, SpainDepartment of Computer Science and Engineering, Computer Architecture and Technology Area (ARCOS), University Carlos III of Madrid, Leganés, SpainMathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USAMathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USAMathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USAConvergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.https://ieeexplore.ieee.org/document/8884083/Big data analyticshigh performance computingsparkDIYMPI
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Silvina Caino-Lores Jesus Carretero Bogdan Nicolae Orcun Yildiz Tom Peterka
spellingShingle	Silvina Caino-Lores Jesus Carretero Bogdan Nicolae Orcun Yildiz Tom Peterka Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY IEEE Access Big data analytics high performance computing spark DIY MPI
author_facet	Silvina Caino-Lores Jesus Carretero Bogdan Nicolae Orcun Yildiz Tom Peterka
author_sort	Silvina Caino-Lores
title	Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
title_short	Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
title_full	Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
title_fullStr	Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
title_full_unstemmed	Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
title_sort	toward high-performance computing and big data analytics convergence: the case of spark-diy
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.
topic	Big data analytics high performance computing spark DIY MPI
url	https://ieeexplore.ieee.org/document/8884083/
work_keys_str_mv	AT silvinacainolores towardhighperformancecomputingandbigdataanalyticsconvergencethecaseofsparkdiy AT jesuscarretero towardhighperformancecomputingandbigdataanalyticsconvergencethecaseofsparkdiy AT bogdannicolae towardhighperformancecomputingandbigdataanalyticsconvergencethecaseofsparkdiy AT orcunyildiz towardhighperformancecomputingandbigdataanalyticsconvergencethecaseofsparkdiy AT tompeterka towardhighperformancecomputingandbigdataanalyticsconvergencethecaseofsparkdiy
_version_	1724188448821084160

Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

Similar Items