A New Cluster Approach to Generating Balanced Clusters

碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In...

Full description

Bibliographic Details
Main Authors:	Hua,Po-Wei, 華柏維
Other Authors:	Tsai,Cheng-Jung
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/59785902964171039535

id	ndltd-TW-104NCUE5479007
record_format	oai_dc
spelling	ndltd-TW-104NCUE54790072017-08-27T04:30:15Z http://ndltd.ncl.edu.tw/handle/59785902964171039535 A New Cluster Approach to Generating Balanced Clusters 一個新的平衡分群演算法之研究 Hua,Po-Wei 華柏維碩士國立彰化師範大學數學系 104 Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters. Tsai,Cheng-Jung 蔡政容 2016 學位論文 ; thesis 32 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters.
author2	Tsai,Cheng-Jung
author_facet	Tsai,Cheng-Jung Hua,Po-Wei 華柏維
author	Hua,Po-Wei 華柏維
spellingShingle	Hua,Po-Wei 華柏維 A New Cluster Approach to Generating Balanced Clusters
author_sort	Hua,Po-Wei
title	A New Cluster Approach to Generating Balanced Clusters
title_short	A New Cluster Approach to Generating Balanced Clusters
title_full	A New Cluster Approach to Generating Balanced Clusters
title_fullStr	A New Cluster Approach to Generating Balanced Clusters
title_full_unstemmed	A New Cluster Approach to Generating Balanced Clusters
title_sort	new cluster approach to generating balanced clusters
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/59785902964171039535
work_keys_str_mv	AT huapowei anewclusterapproachtogeneratingbalancedclusters AT huábǎiwéi anewclusterapproachtogeneratingbalancedclusters AT huapowei yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū AT huábǎiwéi yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū AT huapowei newclusterapproachtogeneratingbalancedclusters AT huábǎiwéi newclusterapproachtogeneratingbalancedclusters
_version_	1718519264578633728

A New Cluster Approach to Generating Balanced Clusters

Similar Items