An Effective Validity Index Method for Gaussian-distributed Clusters of different sizes with various degrees of Dispersion and Overlapping

碩士 === 國立中興大學 === 資訊科學與工程學系 === 104 === Cluster validity index method has two significant functions: assessing the quality of clustering and finding the correct number in cluster grouping. In this thesis, we propose a cluster validity index method, which intends to reduce the problem of a cluster va...

Full description

Bibliographic Details
Main Authors: Cheng-Hshueh Wu, 吳承學
Other Authors: 黃博惠
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/94560761502870248336
Description
Summary:碩士 === 國立中興大學 === 資訊科學與工程學系 === 104 === Cluster validity index method has two significant functions: assessing the quality of clustering and finding the correct number in cluster grouping. In this thesis, we propose a cluster validity index method, which intends to reduce the problem of a cluster validity index method VDO having little tolerance on estimating correct number of clusters for datasets comprising unbalance-populated clusters. Our new method uses the clustering method siibFCM that can tolerate datasets comprising unbalance-populated clusters along with dispersion and overlapping measures for computing the cluster validity index. The dispersion measure is used to estimate the overall data density of clusters in the dataset. Smaller dispersion means that data points are distributed more closely in all clusters. The overlap measure represents the overall separation between any pair of clusters in the dataset. Low degree of overlap means that clusters are well separated each other. By combining these two metrics, we obtain a good cluster validity index. We conducted several experiments to validate the effectiveness of our validity indexing method, including artificial datasets and public real datasets. Experimental results show that our validity indexing method can effectively and reliably estimate the correct/optimal number of clusters that widely differ in size, dispersion, and overlapping.