A Balanced Partitioning Mechanism with Condensed, Collapsed Trie in MapReduce

碩士 === 國立臺灣科技大學 === 電子工程系 === 106 === The MapReduce has emerged as an efficient platform for coping with big data. It achieves this goal by decoupling the data and then distributing the workloads to multiple reducers for processing in a fully parallel manner. The hash function of MapReduce usually g...

Full description

Bibliographic Details
Main Authors: Syu-Huan Chen, 陳旭洹
Other Authors: Hsing-Lung Chen
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/49897a
Description
Summary:碩士 === 國立臺灣科技大學 === 電子工程系 === 106 === The MapReduce has emerged as an efficient platform for coping with big data. It achieves this goal by decoupling the data and then distributing the workloads to multiple reducers for processing in a fully parallel manner. The hash function of MapReduce usually generates the unbalanced workloads to multiple reducers for the skewed data. The unbalanced workloads to multiple reducers lead to degrading the performance of MapReduce significantly, because the overall running time of a map-reduce cycle is determined by the longest running reducer. Thus, it is an important issue to develop a balanced partitioning algorithm which partitions the workloads evenly for all the reducers. The aim of this proposal is to propose a balanced partitioning mechanism with condensed trie in mapreduce, which evenly distributes the data to the reducers. Then, we propose a quasi-optimal packing algorithm to assign sub-partitions to the reducers evenly, resulting in reducing the total execution time. The proposed partitioning mechanism requires a reasonable amount of memory usage and incurs a small running overhead. The experiments using Inverted Indexing on several real-world datasets are conducted to evaluate the performance of our proposed partitioning mechanism.