Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 100 === Semi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a good region of search space. In this paper, we propose a semi-supervised algorithm, Constrained-Nonnegative Matrix Factorization, with a small amount of labeled data as constraints to cluster data. The proposed algorithm is a matrix factorization algorithm. Intuitively a good initial point can speed up clustering convergence and may lead to a better local optimized solution. As the result, we devise an algorithm called Constrained-Fuzzy Cmeans algorithm to obtain initial point. The evaluation function is a key element to evaluate the solution calculated by Constrained-Nonnegative Matrix Factorization, so we have some discussions about the evaluation of Constrained-Nonnegative Matrix Factorization. Finally we conduct experiments on several data sets including CiteUlike, Classic3, 20Newgroups and Reuters, and compare with other semi-supervised learning algorithms. The experimental result indicate that the method we proposed can effectively improve clustering performance.
|