Summary: | 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 100 === In recent years, Internet has been widely adopted and Social network services are getting popular and bringing various applications and services to our daily life. For those application developers, taking the advantages of analyzing those data explosion will be the only way to make their application to be unique.
However, when the data growing to Terabytes or Petabytes scale, it is very hard for legacy programming models to handle with such a big amount of data. People named these big amount of data as Big Data. MapReduce, which was proposed by Googlers in 2004[1], is the very programming model designed to process such Big Data.
Big Data processing has been proposed and has been researched for nearly eight years. Many organizations has adopt MapReduce for their data processing platform. However, the original version of MapReduce was designed for dedicate, homogeneous environments. When it comes to combine different computing nodes to getting higher performance, MapReduce act poorly and cannot make the best use of all computing nodes’ computation capacities.
In order to take the most advantage of different types of the computation resources. We proposed a task scheduling policy for heterogeneous MapReduce clusters. Improving the programming model by analysis the processing flow and getting support by its unique characteristic. With our scheduling policy, the unexpected performance degradation will no longer exist.
|