Summary: | 碩士 === 國立彰化師範大學 === 資訊工程學系 === 103 === The world of big data has arrived. In just last years, we’ve produced more data than in all of human history. To obtain information is more easier than before. High-Value data in the massive data is usually contained less than 2% of the data set. Many professionals research on how to process data set in data mining algorithms on Hadoop platform. However, as the number of resource consumers of cloud service is increasing significantly, it becomes apparent that the capacity-oriented clouds require coming together. Inter-clouds, which an architecture of combined multiple cloud service cluster , is a great solution for this problem. It is need to dispatch remote cloud resource when the computing capacity of local cluster becomes saturated. Inter-clouds architecture was a revolution in the speed of data processing and solved Hadoop Namenode fault problem.
Many effectiveness research topics Hadoop platform are also being performed, such as task scheduling, parameter optimization, file system improvements and other issues. However, those research are running on a single cloud. In this paper, we present a coordinator which can connect two Hadoop clusters and schedule jobs due to the different computing capacity and system resources between two clusters. Experimenting on processing a random data set, generated by using data generator ,based on FP-Growth algorithm which cost a great deal of time and evaluating meta scheduler for inter-clouds.
|