Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9193894/ |
id |
doaj-30312a9814ec4b389dbd18097437aff8 |
---|---|
record_format |
Article |
spelling |
doaj-30312a9814ec4b389dbd18097437aff82021-03-30T03:55:59ZengIEEEIEEE Access2169-35362020-01-01817028117029410.1109/ACCESS.2020.30233449193894Data Processing Model to Perform Big Data Analytics in Hybrid InfrastructuresJulio C. S. Dos Anjos0https://orcid.org/0000-0003-3623-2762Kassiano J. Matteussi1https://orcid.org/0000-0002-9131-6849Paulo R. R. De Souza2Gabriel J. A. Grabher3https://orcid.org/0000-0001-9415-7591Guilherme A. Borges4Jorge L. V. Barbosa5https://orcid.org/0000-0002-0358-2056Gabriel V. Gonzalez6Valderi R. Q. Leithardt7Claudio F. R. Geyer8UFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilUNISINOS/PPGCA, University of Vale do Rio dos Sinos, São Leopoldo, BrazilFaculty of Science, Expert Systems and Applications Laboratory, University of Salamanca, Salamanca, SpainVALORIZA Research Center, Instituto Politécnico de Portalegre, Portalegre, PortugalUFRGS/PPGC, Federal University of Rio Grande do Sul, Institute of Informatics, Porto Alegre, BrazilBig Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.https://ieeexplore.ieee.org/document/9193894/Big data analyticscloud computinghybrid infrastructuresMapReducevolunteer computing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Julio C. S. Dos Anjos Kassiano J. Matteussi Paulo R. R. De Souza Gabriel J. A. Grabher Guilherme A. Borges Jorge L. V. Barbosa Gabriel V. Gonzalez Valderi R. Q. Leithardt Claudio F. R. Geyer |
spellingShingle |
Julio C. S. Dos Anjos Kassiano J. Matteussi Paulo R. R. De Souza Gabriel J. A. Grabher Guilherme A. Borges Jorge L. V. Barbosa Gabriel V. Gonzalez Valderi R. Q. Leithardt Claudio F. R. Geyer Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures IEEE Access Big data analytics cloud computing hybrid infrastructures MapReduce volunteer computing |
author_facet |
Julio C. S. Dos Anjos Kassiano J. Matteussi Paulo R. R. De Souza Gabriel J. A. Grabher Guilherme A. Borges Jorge L. V. Barbosa Gabriel V. Gonzalez Valderi R. Q. Leithardt Claudio F. R. Geyer |
author_sort |
Julio C. S. Dos Anjos |
title |
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures |
title_short |
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures |
title_full |
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures |
title_fullStr |
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures |
title_full_unstemmed |
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures |
title_sort |
data processing model to perform big data analytics in hybrid infrastructures |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing. |
topic |
Big data analytics cloud computing hybrid infrastructures MapReduce volunteer computing |
url |
https://ieeexplore.ieee.org/document/9193894/ |
work_keys_str_mv |
AT juliocsdosanjos dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT kassianojmatteussi dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT paulorrdesouza dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT gabrieljagrabher dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT guilhermeaborges dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT jorgelvbarbosa dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT gabrielvgonzalez dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT valderirqleithardt dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures AT claudiofrgeyer dataprocessingmodeltoperformbigdataanalyticsinhybridinfrastructures |
_version_ |
1724182605317799936 |