Data Clustering with Complete Must-Link Constraints

碩士 === 國立臺灣科技大學 === 工業管理系 === 102 === This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the informatio...

Full description

Bibliographic Details
Main Author: Maisyatus Suadaa Irfana
Other Authors: Chao-Lung Yang
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/qrdgn9
id ndltd-TW-102NTUS5041007
record_format oai_dc
spelling ndltd-TW-102NTUS50410072019-05-15T21:13:19Z http://ndltd.ncl.edu.tw/handle/qrdgn9 Data Clustering with Complete Must-Link Constraints 在完整必連限制條件下的資料分群 Maisyatus Suadaa Irfana Maisyatus Suadaa Irfana 碩士 國立臺灣科技大學 工業管理系 102 This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the information while it is available, to improve efficiency and purity of clustering. CML clustering problem can be considered as aggregating pre-defined data groups. Through the transitive closure process of data aggregation, the data of each group is replaced by their centroid for clustering analysis. This causes information missing issue which means the data distribution or shape of original group is omitted, especially when the group is overlapped with each other. In this research, in order to overcome this problem, a new method named PCA-CML is proposed for CML constrained clustering problem. The principal component analysis (PCA) which provides the supplemental information describing original partition blocks is suggested to be included in the distance matrix of the constrained clustering algorithm if they are overlapped each other. Overlapped ratio is invented to determine whether CML data partitions are overlapped or not. We test the proposed algorithm using the simulated dataset that consists of overlapped and non-overlapped dataset, and real-world dataset containing cartridge quality information. From the experimental result, we can conclude that the proposed algorithm can alleviate missing information issue in CML constrained clustering when pre-defined CML partitions are overlapped. Chao-Lung Yang 楊朝龍博士 2014 學位論文 ; thesis 67 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 工業管理系 === 102 === This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the information while it is available, to improve efficiency and purity of clustering. CML clustering problem can be considered as aggregating pre-defined data groups. Through the transitive closure process of data aggregation, the data of each group is replaced by their centroid for clustering analysis. This causes information missing issue which means the data distribution or shape of original group is omitted, especially when the group is overlapped with each other. In this research, in order to overcome this problem, a new method named PCA-CML is proposed for CML constrained clustering problem. The principal component analysis (PCA) which provides the supplemental information describing original partition blocks is suggested to be included in the distance matrix of the constrained clustering algorithm if they are overlapped each other. Overlapped ratio is invented to determine whether CML data partitions are overlapped or not. We test the proposed algorithm using the simulated dataset that consists of overlapped and non-overlapped dataset, and real-world dataset containing cartridge quality information. From the experimental result, we can conclude that the proposed algorithm can alleviate missing information issue in CML constrained clustering when pre-defined CML partitions are overlapped.
author2 Chao-Lung Yang
author_facet Chao-Lung Yang
Maisyatus Suadaa Irfana
Maisyatus Suadaa Irfana
author Maisyatus Suadaa Irfana
Maisyatus Suadaa Irfana
spellingShingle Maisyatus Suadaa Irfana
Maisyatus Suadaa Irfana
Data Clustering with Complete Must-Link Constraints
author_sort Maisyatus Suadaa Irfana
title Data Clustering with Complete Must-Link Constraints
title_short Data Clustering with Complete Must-Link Constraints
title_full Data Clustering with Complete Must-Link Constraints
title_fullStr Data Clustering with Complete Must-Link Constraints
title_full_unstemmed Data Clustering with Complete Must-Link Constraints
title_sort data clustering with complete must-link constraints
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/qrdgn9
work_keys_str_mv AT maisyatussuadaairfana dataclusteringwithcompletemustlinkconstraints
AT maisyatussuadaairfana dataclusteringwithcompletemustlinkconstraints
AT maisyatussuadaairfana zàiwánzhěngbìliánxiànzhìtiáojiànxiàdezīliàofēnqún
AT maisyatussuadaairfana zàiwánzhěngbìliánxiànzhìtiáojiànxiàdezīliàofēnqún
_version_ 1719110825064529920