Summary: | 碩士 === 國立臺灣科技大學 === 電子工程系 === 106 === The MapReduce has emerged as an efficient platform for coping with
big data. It achieves this goal by decoupling the data and then distributing
the workloads to multiple reducers for processing in a fully parallel manner.
The hash function of MapReduce usually generates the unbalanced
workloads to multiple reducers for the skewed data. The unbalanced
workloads to multiple reducers lead to degrading the performance of
MapReduce significantly, because the overall running time of a map-reduce
cycle is determined by the longest running reducer. Thus, it is an important
issue to develop a balanced partitioning algorithm which partitions the
workloads evenly for all the reducers.
The aim of this proposal is to propose a balanced partitioning
mechanism with condensed trie in mapreduce, which evenly distributes the
data to the reducers. Then, we propose a quasi-optimal packing algorithm to
assign sub-partitions to the reducers evenly, resulting in reducing the total
execution time. The proposed partitioning mechanism requires a reasonable
amount of memory usage and incurs a small running overhead. The
experiments using Inverted Indexing on several real-world datasets are
conducted to evaluate the performance of our proposed partitioning
mechanism.
|