A New Cluster Approach to Generating Balanced Clusters

碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In...

Full description

Bibliographic Details
Main Authors: Hua,Po-Wei, 華柏維
Other Authors: Tsai,Cheng-Jung
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/59785902964171039535
Description
Summary:碩士 === 國立彰化師範大學 === 數學系 === 104 === Cluster is a common data mining technique. Cluster partitions a set of samples into subsets and each subset forms a cluster. The main principle of cluster is that the samples in a cluster are similar to one another, yet dissimilar to samples in other clusters. In other words, the samples in a cluster have high homogeneity and different clusters have high heterogeneity. Cluster has been widely used in many applications such as biological technology, heterogeneous network, business finance, information retrieval, economics, and so on. However, in our daily life a user maybe wants the result of “balanced clusters”. Compared to traditional cluster, the aim of balanced cluster is to make the samples in a same cluster have high heterogeneity but different clusters have high homogeneity. Balanced cluster is practical in our daily life, such as balanced diet, assignment of jobs, normal class grouping, cluster sampling, and so on. However, according to our survey of related papers in the research field of data mining, no research about balanced cluster has been published. Nowadays, two common methods to generate balanced clusters are "random assignment method" and "S-shape placement method". There are two main weaknesses for S-shape placement method. One is that it generates balanced clusters by only considering ranking, not the actual difference between samples. The other weakness is that it uses mean to handle multidimensional data. As for random assignment method, it suffers from the problem of generating good balanced clusters. In order to solve the above-mentioned problems, this paper proposes a new balanced cluster algorithm named Ripple. Experimental results showed that Ripple outperforms S-shape placement method and random assignment method in the aspect of generating good balanced clusters.