Data Clustering with Complete Must-Link Constraints
碩士 === 國立臺灣科技大學 === 工業管理系 === 102 === This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the informatio...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/qrdgn9 |
id |
ndltd-TW-102NTUS5041007 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NTUS50410072019-05-15T21:13:19Z http://ndltd.ncl.edu.tw/handle/qrdgn9 Data Clustering with Complete Must-Link Constraints 在完整必連限制條件下的資料分群 Maisyatus Suadaa Irfana Maisyatus Suadaa Irfana 碩士 國立臺灣科技大學 工業管理系 102 This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the information while it is available, to improve efficiency and purity of clustering. CML clustering problem can be considered as aggregating pre-defined data groups. Through the transitive closure process of data aggregation, the data of each group is replaced by their centroid for clustering analysis. This causes information missing issue which means the data distribution or shape of original group is omitted, especially when the group is overlapped with each other. In this research, in order to overcome this problem, a new method named PCA-CML is proposed for CML constrained clustering problem. The principal component analysis (PCA) which provides the supplemental information describing original partition blocks is suggested to be included in the distance matrix of the constrained clustering algorithm if they are overlapped each other. Overlapped ratio is invented to determine whether CML data partitions are overlapped or not. We test the proposed algorithm using the simulated dataset that consists of overlapped and non-overlapped dataset, and real-world dataset containing cartridge quality information. From the experimental result, we can conclude that the proposed algorithm can alleviate missing information issue in CML constrained clustering when pre-defined CML partitions are overlapped. Chao-Lung Yang 楊朝龍博士 2014 學位論文 ; thesis 67 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 工業管理系 === 102 === This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the information while it is available, to improve efficiency and purity of clustering. CML clustering problem can be considered as aggregating pre-defined data groups. Through the transitive closure process of data aggregation, the data of each group is replaced by their centroid for clustering analysis. This causes information missing issue which means the data distribution or shape of original group is omitted, especially when the group is overlapped with each other. In this research, in order to overcome this problem, a new method named PCA-CML is proposed for CML constrained clustering problem. The principal component analysis (PCA) which provides the supplemental information describing original partition blocks is suggested to be included in the distance matrix of the constrained clustering algorithm if they are overlapped each other. Overlapped ratio is invented to determine whether CML data partitions are overlapped or not. We test the proposed algorithm using the simulated dataset that consists of overlapped and non-overlapped dataset, and real-world dataset containing cartridge quality information. From the experimental result, we can conclude that the proposed algorithm can alleviate missing information issue in CML constrained clustering when pre-defined CML partitions are overlapped.
|
author2 |
Chao-Lung Yang |
author_facet |
Chao-Lung Yang Maisyatus Suadaa Irfana Maisyatus Suadaa Irfana |
author |
Maisyatus Suadaa Irfana Maisyatus Suadaa Irfana |
spellingShingle |
Maisyatus Suadaa Irfana Maisyatus Suadaa Irfana Data Clustering with Complete Must-Link Constraints |
author_sort |
Maisyatus Suadaa Irfana |
title |
Data Clustering with Complete Must-Link Constraints |
title_short |
Data Clustering with Complete Must-Link Constraints |
title_full |
Data Clustering with Complete Must-Link Constraints |
title_fullStr |
Data Clustering with Complete Must-Link Constraints |
title_full_unstemmed |
Data Clustering with Complete Must-Link Constraints |
title_sort |
data clustering with complete must-link constraints |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/qrdgn9 |
work_keys_str_mv |
AT maisyatussuadaairfana dataclusteringwithcompletemustlinkconstraints AT maisyatussuadaairfana dataclusteringwithcompletemustlinkconstraints AT maisyatussuadaairfana zàiwánzhěngbìliánxiànzhìtiáojiànxiàdezīliàofēnqún AT maisyatussuadaairfana zàiwánzhěngbìliánxiànzhìtiáojiànxiàdezīliàofēnqún |
_version_ |
1719110825064529920 |