Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 100 === Semi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a g...

Full description

Bibliographic Details
Main Authors: Li, Hsuan-Hsun, 李炫勳
Other Authors: Lee, Chia-Hoang
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/30554527577087405135
id ndltd-TW-100NCTU5394108
record_format oai_dc
spelling ndltd-TW-100NCTU53941082016-03-28T04:20:38Z http://ndltd.ncl.edu.tw/handle/30554527577087405135 Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization 基於Constrained-Nonnegative Matrix Factorization之半監督式分群法 Li, Hsuan-Hsun 李炫勳 碩士 國立交通大學 資訊科學與工程研究所 100 Semi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a good region of search space. In this paper, we propose a semi-supervised algorithm, Constrained-Nonnegative Matrix Factorization, with a small amount of labeled data as constraints to cluster data. The proposed algorithm is a matrix factorization algorithm. Intuitively a good initial point can speed up clustering convergence and may lead to a better local optimized solution. As the result, we devise an algorithm called Constrained-Fuzzy Cmeans algorithm to obtain initial point. The evaluation function is a key element to evaluate the solution calculated by Constrained-Nonnegative Matrix Factorization, so we have some discussions about the evaluation of Constrained-Nonnegative Matrix Factorization. Finally we conduct experiments on several data sets including CiteUlike, Classic3, 20Newgroups and Reuters, and compare with other semi-supervised learning algorithms. The experimental result indicate that the method we proposed can effectively improve clustering performance. Lee, Chia-Hoang 李嘉晃 2012 學位論文 ; thesis 39 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 100 === Semi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a good region of search space. In this paper, we propose a semi-supervised algorithm, Constrained-Nonnegative Matrix Factorization, with a small amount of labeled data as constraints to cluster data. The proposed algorithm is a matrix factorization algorithm. Intuitively a good initial point can speed up clustering convergence and may lead to a better local optimized solution. As the result, we devise an algorithm called Constrained-Fuzzy Cmeans algorithm to obtain initial point. The evaluation function is a key element to evaluate the solution calculated by Constrained-Nonnegative Matrix Factorization, so we have some discussions about the evaluation of Constrained-Nonnegative Matrix Factorization. Finally we conduct experiments on several data sets including CiteUlike, Classic3, 20Newgroups and Reuters, and compare with other semi-supervised learning algorithms. The experimental result indicate that the method we proposed can effectively improve clustering performance.
author2 Lee, Chia-Hoang
author_facet Lee, Chia-Hoang
Li, Hsuan-Hsun
李炫勳
author Li, Hsuan-Hsun
李炫勳
spellingShingle Li, Hsuan-Hsun
李炫勳
Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
author_sort Li, Hsuan-Hsun
title Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
title_short Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
title_full Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
title_fullStr Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
title_full_unstemmed Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization
title_sort clustering with labeled and unlabeled data based on constrained -nonnegative matrix factorization
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/30554527577087405135
work_keys_str_mv AT lihsuanhsun clusteringwithlabeledandunlabeleddatabasedonconstrainednonnegativematrixfactorization
AT lǐxuànxūn clusteringwithlabeledandunlabeleddatabasedonconstrainednonnegativematrixfactorization
AT lihsuanhsun jīyúconstrainednonnegativematrixfactorizationzhībànjiāndūshìfēnqúnfǎ
AT lǐxuànxūn jīyúconstrainednonnegativematrixfactorizationzhībànjiāndūshìfēnqúnfǎ
_version_ 1718213422733066240