Unsupervised Clustering Techniques Based on Genetic Algorithms

博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clusterin...

Full description

Bibliographic Details
Main Authors: Fu-Wen Yang, 楊富文
Other Authors: Hwei-Jen Lin
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/82461999139557420974
Description
Summary:博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clustering or non-parametric clustering, for which the evolutionary approaches are often employed to provide appropriate clustering results. The best-known evolutionary techniques are genetic algorithms (GAs). In this thesis, we propose two evolutionary strategies for unsupervised clustering. One is GA-based, and the other is based on population Markov chain modeling, which improves the Yong Gao’s algorithm to transform the evolutionary process of the population in the canonical genetic algorithms into the probability of Markov chain modeling for each gene. In both strategies, our algorithms select data from the data set as the candidates of cluster centers, and adopt binary representation to encode the number of cluster centers. In order to speed up the evaluation of fitness functions, a look up table of distances between all pairs of data points is developed in advance. In the first strategy, more effective operators of crossover and mutation are introduced. Because we use a binary representation to encode the number of clusters, unlike string representation (real-number encoding), we save a great deal of time for float-point computation during GAs operations (e.g. reproduction, crossover and mutation). In the second strategy, the algorithms produce a new chromosome according to the probability of Markov chain modeling for each gene without any conventional GAs operators. Hence, these proposed algorithms save a lot of computational costs than other proposed GA-based clustering algorithms. Finally, the Davies-Bouldin index is employed to measure the validity of the clusters. The superiority of the proposed algorithms over others is demonstrated in the experimental results, which show that the proposed algorithms achieve better performance in less computation time in comparison with other proposed genetic clustering algorithms.