A New Cluster Approach to Generating Balanced Clusters

碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In...

Full description

Bibliographic Details
Main Authors: Hua,Po-Wei, 華柏維
Other Authors: Tsai,Cheng-Jung
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/59785902964171039535
id ndltd-TW-104NCUE5479007
record_format oai_dc
spelling ndltd-TW-104NCUE54790072017-08-27T04:30:15Z http://ndltd.ncl.edu.tw/handle/59785902964171039535 A New Cluster Approach to Generating Balanced Clusters 一個新的平衡分群演算法之研究 Hua,Po-Wei 華柏維 碩士 國立彰化師範大學 數學系 104 Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters. Tsai,Cheng-Jung 蔡政容 2016 學位論文 ; thesis 32 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters.
author2 Tsai,Cheng-Jung
author_facet Tsai,Cheng-Jung
Hua,Po-Wei
華柏維
author Hua,Po-Wei
華柏維
spellingShingle Hua,Po-Wei
華柏維
A New Cluster Approach to Generating Balanced Clusters
author_sort Hua,Po-Wei
title A New Cluster Approach to Generating Balanced Clusters
title_short A New Cluster Approach to Generating Balanced Clusters
title_full A New Cluster Approach to Generating Balanced Clusters
title_fullStr A New Cluster Approach to Generating Balanced Clusters
title_full_unstemmed A New Cluster Approach to Generating Balanced Clusters
title_sort new cluster approach to generating balanced clusters
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/59785902964171039535
work_keys_str_mv AT huapowei anewclusterapproachtogeneratingbalancedclusters
AT huábǎiwéi anewclusterapproachtogeneratingbalancedclusters
AT huapowei yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū
AT huábǎiwéi yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū
AT huapowei newclusterapproachtogeneratingbalancedclusters
AT huábǎiwéi newclusterapproachtogeneratingbalancedclusters
_version_ 1718519264578633728