Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures

Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-...

Full description

Bibliographic Details
Main Authors: Julio C. S. Dos Anjos, Kassiano J. Matteussi, Paulo R. R. De Souza, Gabriel J. A. Grabher, Guilherme A. Borges, Jorge L. V. Barbosa, Gabriel V. Gonzalez, Valderi R. Q. Leithardt, Claudio F. R. Geyer
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9193894/
id doaj-30312a9814ec4b389dbd18097437aff8
record_format Article
spelling doaj-30312a9814ec4b389dbd18097437aff82021-03-30T03:55:59ZengIEEEIEEE Access2169-35362020-01-01817028117029410.1109/ACCESS.2020.30233449193894Data Processing Model to Perform Big Data Analytics in Hybrid InfrastructuresJulio C. S. Dos Anjos0https://orcid.org/0000-0003-3623-2762Kassiano J. Matteussi1https://orcid.org/0000-0002-9131-6849Paulo R. R. De Souza2Gabriel J. A. Grabher3https://orcid.org/0000-0001-9415-7591Guilherme A. Borges4Jorge L. V. Barbosa5https://orcid.org/0000-0002-0358-2056Gabriel V. Gonzalez6Valderi R. Q. Leithardt7Claudio F. R. Geyer8UFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUNISINOS/PPGCA, University of Vale do Rio dos Sinos, São Leopoldo, BrazilFaculty of Science, Expert Systems and Applications Laboratory, University of Salamanca, Salamanca, SpainVALORIZA Research Center, Instituto Politécnico de Portalegre, Portalegre, PortugalUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilBig Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.https://ieeexplore.ieee.org/document/9193894/Big data analyticscloud computinghybrid infrastructuresMapReducevolunteer computing
collection DOAJ
language English
format Article
sources DOAJ
author Julio C. S. Dos Anjos
Kassiano J. Matteussi
Paulo R. R. De Souza
Gabriel J. A. Grabher
Guilherme A. Borges
Jorge L. V. Barbosa
Gabriel V. Gonzalez
Valderi R. Q. Leithardt
Claudio F. R. Geyer
spellingShingle Julio C. S. Dos Anjos
Kassiano J. Matteussi
Paulo R. R. De Souza
Gabriel J. A. Grabher
Guilherme A. Borges
Jorge L. V. Barbosa
Gabriel V. Gonzalez
Valderi R. Q. Leithardt
Claudio F. R. Geyer
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
IEEE Access
Big data analytics
cloud computing
hybrid infrastructures
MapReduce
volunteer computing
author_facet Julio C. S. Dos Anjos
Kassiano J. Matteussi
Paulo R. R. De Souza
Gabriel J. A. Grabher
Guilherme A. Borges
Jorge L. V. Barbosa
Gabriel V. Gonzalez
Valderi R. Q. Leithardt
Claudio F. R. Geyer
author_sort Julio C. S. Dos Anjos
title Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
title_short Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
title_full Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
title_fullStr Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
title_full_unstemmed Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
title_sort data processing model to perform big data analytics in hybrid infrastructures
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.
topic Big data analytics
cloud computing
hybrid infrastructures
MapReduce
volunteer computing
url https://ieeexplore.ieee.org/document/9193894/
work_keys_str_mv AT juliocsdosanjos dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT kassianojmatteussi dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT paulorrdesouza dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT gabrieljagrabher dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT guilhermeaborges dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT jorgelvbarbosa dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT gabrielvgonzalez dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT valderirqleithardt dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
AT claudiofrgeyer dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures
_version_ 1724182605317799936