Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 104 === As the development of technology, the amount of data grows exponentially. This makes data clustering more and more important, since clustering is an important technique in data exploration.
Clustering is an unsupervised learning method, so improving performance and obtaining robust clustering results are challenging tasks in machine learning. Moreover, specifying the number of clusters in another problem for a certain class of clustering algorithms. Previous studies have shown that ensemble learning considers many clustering methods and aggregates their results, which can always yield a better and more robust result than a single one. This thesis proposes a feature-based ensemble clustering model based on the Indian Buffet Process(IBP). Additionally, the proposed model does not need to know the number of clusters in advance, and obtain the most suitable one for the data during the process of clustering. The proposed method uses quality and diversity as performance criteria to select feature subsets based on IBP and the proposed greedy algorithm. Each feature subset is considered as a view of the data and each subset results in ten clustering results. The final clustering result is the aggregation of these results by using the proposed aggregation algorithm. The experimental results indicate that the proposed model generally outperforms other unsupervised methods.
|