Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark

Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark

In the era of data deluge, Big Data gradually offers numerous opportunities, but also poses significant challenges to conventional data processing and analysis methods. MapReduce has become a prominent parallel and distributed programming model for efficiently handling such massive datasets. One of...

Full description

Bibliographic Details
Main Authors:	Cao, H.-P (Author), Phan, A.-C (Author), Phan, T.-C (Author), Trieu, T.-N (Author)
Format:	Article
Language:	English
Published:	MDPI 2022
Subjects:	Apache Spark big data analytics MapReduce skew join
Online Access:	View Fulltext in Publisher

Similar Items

Optimization for big joins and recursive query evaluation using intersection and difference filters in MapReduce
by: Phan, Thuong-Cang
Published: (2014)

Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
by: Jinbae Lee, et al.
Published: (2019-01-01)

Large Scale Implementations for Twitter Sentiment Classification
by: Andreas Kanavos, et al.
Published: (2017-03-01)

MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce
by: Matheus H. M. Pericini, et al.
Published: (2018-12-01)

Skewness-Based Partitioning in SpatialHadoop
by: Alberto Belussi, et al.
Published: (2020-03-01)

Improving MapReduce Performance on Clusters
by: Gault, Sylvain
Published: (2015)

Optimization for big joins and recursive query evaluation using intersection and difference filters in MapReduce
by: Phan, Thuong-Cang
Published: (2014)

A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
by: N. Ahmed, et al.
Published: (2020-12-01)

Utilização de metaheurísticas para balanceamento de carga em ambientes MapReduce
by: Pericini, Matheus Henrique Machado
Published: (2017)

Experimenting sensitivity-based anonymization framework in apache spark
by: Mohammed Al-Zobbi, et al.
Published: (2018-10-01)

IoT-enabled directed acyclic graph in spark cluster
by: Jahwan Koo, et al.
Published: (2020-09-01)

Extending the Growing Hierarchical Self Organizing Maps for a Large Mixed-Attribute Dataset Using Spark MapReduce
by: Malondkar, Ameya Mohan
Published: (2015)

Scalable Embeddings for Kernel Clustering on MapReduce
by: Elgohary, Ahmed
Published: (2014)

An Efficient Platform for Large-Scale MapReduce Processing
by: Wang, Liqiang
Published: (2009)

PROGRESSIVE DATA ANALYTICS IN HEALTH INFORMATICS USING AMAZON ELASTIC MAPREDUCE (EMR)
by: J S Shyam Mohan, et al.
Published: (2016-04-01)

On using MapReduce to scale algorithms for Big Data analytics: a case study
by: Phongphun Kijsanayothin, et al.
Published: (2019-11-01)

Secure Distributed MapReduce Protocols : How to have privacy-preserving cloud applications?
by: Giraud, Matthieu
Published: (2019)

A MapReduce Approach for Traffic Matrix Estimation in SDN
by: Wander J. Queiroz, et al.
Published: (2020-01-01)

Large Scale Product Recommendation of Supermarket Ware Based on Customer Behaviour Analysis
by: Andreas Kanavos, et al.
Published: (2018-05-01)

Scaling associative classification for very large datasets
by: Luca Venturini, et al.
Published: (2017-12-01)

CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets
by: Ruiqi Liao, et al.
Published: (2014-02-01)

Performance assessment of Apache Spark applications
by: AL Jorani, Salam
Published: (2019)

Acerca de la aplicación de MapReduce + Hadoop en el tratamiento de Big Data
by: Antonio Hernández Dominguez, et al.
Published: (2015-07-01)

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
by: Yong Wang, et al.
Published: (2016-02-01)

Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases
by: Jorge Veiga, et al.
Published: (2018-01-01)

Deploying Apache Spark virtual clusters in cloud environments using orchestration technologies
by: O. . Borisenko, et al.
Published: (2018-10-01)

分散式計算系統及巨量資料處理架構設計-基於YARN, Storm及Spark
by: 曾柏崴, et al.

以MapReduce做有效率的天際線查詢
by: 陳家慶, et al.

High-Performance Geospatial Big Data Processing System Based on MapReduce
by: Junghee Jo, et al.
Published: (2018-10-01)

MR-Tree - A Scalable MapReduce Algorithm for Building Decision Trees
by: Vasile PURDILĂ, et al.
Published: (2014-03-01)

Uma análise comparativa de ambientes para Big Data: Apche Spark e HPAT
by: Rafael Aquino de Carvalho
Published: (2018)

Uma análise comparativa de ambientes para Big Data: Apche Spark e HPAT
by: Carvalho, Rafael Aquino de
Published: (2018)

MapReduce particle filtering with exact resampling and deterministic runtime
by: Jeyarajan Thiyagalingam, et al.
Published: (2017-10-01)

An Intelligent and Time-Efficient DDoS Identification Framework for Real-Time Enterprise Networks: SAD-F: Spark Based Anomaly Detection Framework
by: Awais Ahmed, et al.
Published: (2020-01-01)

Scheduling Spark Tasks With Data Skew and Deadline Constraints
by: Haihua Gu, et al.
Published: (2021-01-01)

Utilizing MapReduce to Improve Probe-Car Track Data Mining
by: Li Zheng, et al.
Published: (2018-07-01)

Big Data Analytics in Healthcare using Machine Learning Algorithms: A Comparative Study
by: Sai Hanuman Akundi, et al.
Published: (2020-11-01)

Energy-efficient Straggler Mitigation for Big Data Applications on the Clouds
by: Phan, Tien-Dat
Published: (2017)

Towards a Virtual Domain Based Authentication on MapReduce
by: Ibrahim Lahmer, et al.
Published: (2016-01-01)

PERIDOT: Modeling Execution Time of Spark Applications
by: Sarah Shah, et al.
Published: (2021-01-01)

Cannot write session to /tmp/vufind_sessions/sess_dt2oeotn6pc2fcfb7up2hii4km