Unsupervised Clustering Techniques Based on Genetic Algorithms

博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clusterin...

Full description

Bibliographic Details
Main Authors:	Fu-Wen Yang, 楊富文
Other Authors:	Hwei-Jen Lin
Format:	Others
Language:	en_US
Published:	2005
Online Access:	http://ndltd.ncl.edu.tw/handle/82461999139557420974

id	ndltd-TW-093TKU05392050
record_format	oai_dc
spelling	ndltd-TW-093TKU053920502015-10-13T11:57:26Z http://ndltd.ncl.edu.tw/handle/82461999139557420974 Unsupervised Clustering Techniques Based on Genetic Algorithms 以基因演算法為基礎之非監督式分群技術 Fu-Wen Yang 楊富文博士淡江大學資訊工程學系博士班 93 The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clustering or non-parametric clustering, for which the evolutionary approaches are often employed to provide appropriate clustering results. The best-known evolutionary techniques are genetic algorithms (GAs). In this thesis, we propose two evolutionary strategies for unsupervised clustering. One is GA-based, and the other is based on population Markov chain modeling, which improves the Yong Gao’s algorithm to transform the evolutionary process of the population in the canonical genetic algorithms into the probability of Markov chain modeling for each gene. In both strategies, our algorithms select data from the data set as the candidates of cluster centers, and adopt binary representation to encode the number of cluster centers. In order to speed up the evaluation of fitness functions, a look up table of distances between all pairs of data points is developed in advance. In the first strategy, more effective operators of crossover and mutation are introduced. Because we use a binary representation to encode the number of clusters, unlike string representation (real-number encoding), we save a great deal of time for float-point computation during GAs operations (e.g. reproduction, crossover and mutation). In the second strategy, the algorithms produce a new chromosome according to the probability of Markov chain modeling for each gene without any conventional GAs operators. Hence, these proposed algorithms save a lot of computational costs than other proposed GA-based clustering algorithms. Finally, the Davies-Bouldin index is employed to measure the validity of the clusters. The superiority of the proposed algorithms over others is demonstrated in the experimental results, which show that the proposed algorithms achieve better performance in less computation time in comparison with other proposed genetic clustering algorithms. Hwei-Jen Lin 林慧珍 2005 學位論文 ; thesis 70 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clustering or non-parametric clustering, for which the evolutionary approaches are often employed to provide appropriate clustering results. The best-known evolutionary techniques are genetic algorithms (GAs). In this thesis, we propose two evolutionary strategies for unsupervised clustering. One is GA-based, and the other is based on population Markov chain modeling, which improves the Yong Gao’s algorithm to transform the evolutionary process of the population in the canonical genetic algorithms into the probability of Markov chain modeling for each gene. In both strategies, our algorithms select data from the data set as the candidates of cluster centers, and adopt binary representation to encode the number of cluster centers. In order to speed up the evaluation of fitness functions, a look up table of distances between all pairs of data points is developed in advance. In the first strategy, more effective operators of crossover and mutation are introduced. Because we use a binary representation to encode the number of clusters, unlike string representation (real-number encoding), we save a great deal of time for float-point computation during GAs operations (e.g. reproduction, crossover and mutation). In the second strategy, the algorithms produce a new chromosome according to the probability of Markov chain modeling for each gene without any conventional GAs operators. Hence, these proposed algorithms save a lot of computational costs than other proposed GA-based clustering algorithms. Finally, the Davies-Bouldin index is employed to measure the validity of the clusters. The superiority of the proposed algorithms over others is demonstrated in the experimental results, which show that the proposed algorithms achieve better performance in less computation time in comparison with other proposed genetic clustering algorithms.
author2	Hwei-Jen Lin
author_facet	Hwei-Jen Lin Fu-Wen Yang 楊富文
author	Fu-Wen Yang 楊富文
spellingShingle	Fu-Wen Yang 楊富文 Unsupervised Clustering Techniques Based on Genetic Algorithms
author_sort	Fu-Wen Yang
title	Unsupervised Clustering Techniques Based on Genetic Algorithms
title_short	Unsupervised Clustering Techniques Based on Genetic Algorithms
title_full	Unsupervised Clustering Techniques Based on Genetic Algorithms
title_fullStr	Unsupervised Clustering Techniques Based on Genetic Algorithms
title_full_unstemmed	Unsupervised Clustering Techniques Based on Genetic Algorithms
title_sort	unsupervised clustering techniques based on genetic algorithms
publishDate	2005
url	http://ndltd.ncl.edu.tw/handle/82461999139557420974
work_keys_str_mv	AT fuwenyang unsupervisedclusteringtechniquesbasedongeneticalgorithms AT yángfùwén unsupervisedclusteringtechniquesbasedongeneticalgorithms AT fuwenyang yǐjīyīnyǎnsuànfǎwèijīchǔzhīfēijiāndūshìfēnqúnjìshù AT yángfùwén yǐjīyīnyǎnsuànfǎwèijīchǔzhīfēijiāndūshìfēnqúnjìshù
_version_	1716851835112783872

Unsupervised Clustering Techniques Based on Genetic Algorithms

Similar Items