Summary: | 碩士 === 國立臺中科技大學 === 資訊工程系碩士班 === 104 === With the flourishing development of current Cloud technology and the coming of the Internet of Things, equipment of cloud-related software and hardware have continuously upgraded, and cloud-related application in live is also gradually widespread; therefore, how to provide high reliability cloud environment in cloud system and service is very important. However, for IT professionals, they also face great challenges in the maintenance and operation of cloud system platform. In view of this, it is necessary for performing dynamic collection and merger with the log data under cloud system platform to monitor the maintenance and operation condition of cloud system platform. The thesis proposes a centralized log management and analysis system based on open source OpenStack operating system; aims at the distributed log data in OpenStack system to perform dynamic data collection, storage, and analysis of visualized statistics, also cooperates open source Apache Spark distributed computing frame to perform the log data exploration analysis to provide the solution for high performance data analysis. The study further directs at the Spark distributed computing frame to discuss and estimate including the performance difference in size scheduling between the Spark streaming analysis and the batch analysis which operate at the Mesos pattern and at the Yarn pattern respectively. Moreover, carry out the Streaming-KMeans algorithm and the regression algorithm based on the SparkMlib to predict model analysis. Different setting of algorithm parameter, different number of cluster node and different size of memory could affect related parameter for the performance and accuracy of model. Therefore we can estimate optimal parameter setting and parallel method.
|