A New Cluster Approach to Generating Balanced Clusters
碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/59785902964171039535 |
id |
ndltd-TW-104NCUE5479007 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NCUE54790072017-08-27T04:30:15Z http://ndltd.ncl.edu.tw/handle/59785902964171039535 A New Cluster Approach to Generating Balanced Clusters 一個新的平衡分群演算法之研究 Hua,Po-Wei 華柏維 碩士 國立彰化師範大學 數學系 104 Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters. Tsai,Cheng-Jung 蔡政容 2016 學位論文 ; thesis 32 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters.
|
author2 |
Tsai,Cheng-Jung |
author_facet |
Tsai,Cheng-Jung Hua,Po-Wei 華柏維 |
author |
Hua,Po-Wei 華柏維 |
spellingShingle |
Hua,Po-Wei 華柏維 A New Cluster Approach to Generating Balanced Clusters |
author_sort |
Hua,Po-Wei |
title |
A New Cluster Approach to Generating Balanced Clusters |
title_short |
A New Cluster Approach to Generating Balanced Clusters |
title_full |
A New Cluster Approach to Generating Balanced Clusters |
title_fullStr |
A New Cluster Approach to Generating Balanced Clusters |
title_full_unstemmed |
A New Cluster Approach to Generating Balanced Clusters |
title_sort |
new cluster approach to generating balanced clusters |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/59785902964171039535 |
work_keys_str_mv |
AT huapowei anewclusterapproachtogeneratingbalancedclusters AT huábǎiwéi anewclusterapproachtogeneratingbalancedclusters AT huapowei yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū AT huábǎiwéi yīgèxīndepínghéngfēnqúnyǎnsuànfǎzhīyánjiū AT huapowei newclusterapproachtogeneratingbalancedclusters AT huábǎiwéi newclusterapproachtogeneratingbalancedclusters |
_version_ |
1718519264578633728 |