Unsupervised Clustering Techniques Based on Genetic Algorithms

博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clusterin...

Full description

Bibliographic Details
Main Authors: Fu-Wen Yang, 楊富文
Other Authors: Hwei-Jen Lin
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/82461999139557420974
id ndltd-TW-093TKU05392050
record_format oai_dc
spelling ndltd-TW-093TKU053920502015-10-13T11:57:26Z http://ndltd.ncl.edu.tw/handle/82461999139557420974 Unsupervised Clustering Techniques Based on Genetic Algorithms 以基因演算法為基礎之非監督式分群技術 Fu-Wen Yang 楊富文 博士 淡江大學 資訊工程學系博士班 93 The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clustering or non-parametric clustering, for which the evolutionary approaches are often employed to provide appropriate clustering results. The best-known evolutionary techniques are genetic algorithms (GAs). In this thesis, we propose two evolutionary strategies for unsupervised clustering. One is GA-based, and the other is based on population Markov chain modeling, which improves the Yong Gao’s algorithm to transform the evolutionary process of the population in the canonical genetic algorithms into the probability of Markov chain modeling for each gene. In both strategies, our algorithms select data from the data set as the candidates of cluster centers, and adopt binary representation to encode the number of cluster centers. In order to speed up the evaluation of fitness functions, a look up table of distances between all pairs of data points is developed in advance. In the first strategy, more effective operators of crossover and mutation are introduced. Because we use a binary representation to encode the number of clusters, unlike string representation (real-number encoding), we save a great deal of time for float-point computation during GAs operations (e.g. reproduction, crossover and mutation). In the second strategy, the algorithms produce a new chromosome according to the probability of Markov chain modeling for each gene without any conventional GAs operators. Hence, these proposed algorithms save a lot of computational costs than other proposed GA-based clustering algorithms. Finally, the Davies-Bouldin index is employed to measure the validity of the clusters. The superiority of the proposed algorithms over others is demonstrated in the experimental results, which show that the proposed algorithms achieve better performance in less computation time in comparison with other proposed genetic clustering algorithms. Hwei-Jen Lin 林慧珍 2005 學位論文 ; thesis 70 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 淡江大學 === 資訊工程學系博士班 === 93 === The number of clusters of a data set is not known in most real life situations, and no clustering system is capable of efficiently automated forming nature groups of the input patterns in these situations. Such difficult problems are called unsupervised clustering or non-parametric clustering, for which the evolutionary approaches are often employed to provide appropriate clustering results. The best-known evolutionary techniques are genetic algorithms (GAs). In this thesis, we propose two evolutionary strategies for unsupervised clustering. One is GA-based, and the other is based on population Markov chain modeling, which improves the Yong Gao’s algorithm to transform the evolutionary process of the population in the canonical genetic algorithms into the probability of Markov chain modeling for each gene. In both strategies, our algorithms select data from the data set as the candidates of cluster centers, and adopt binary representation to encode the number of cluster centers. In order to speed up the evaluation of fitness functions, a look up table of distances between all pairs of data points is developed in advance. In the first strategy, more effective operators of crossover and mutation are introduced. Because we use a binary representation to encode the number of clusters, unlike string representation (real-number encoding), we save a great deal of time for float-point computation during GAs operations (e.g. reproduction, crossover and mutation). In the second strategy, the algorithms produce a new chromosome according to the probability of Markov chain modeling for each gene without any conventional GAs operators. Hence, these proposed algorithms save a lot of computational costs than other proposed GA-based clustering algorithms. Finally, the Davies-Bouldin index is employed to measure the validity of the clusters. The superiority of the proposed algorithms over others is demonstrated in the experimental results, which show that the proposed algorithms achieve better performance in less computation time in comparison with other proposed genetic clustering algorithms.
author2 Hwei-Jen Lin
author_facet Hwei-Jen Lin
Fu-Wen Yang
楊富文
author Fu-Wen Yang
楊富文
spellingShingle Fu-Wen Yang
楊富文
Unsupervised Clustering Techniques Based on Genetic Algorithms
author_sort Fu-Wen Yang
title Unsupervised Clustering Techniques Based on Genetic Algorithms
title_short Unsupervised Clustering Techniques Based on Genetic Algorithms
title_full Unsupervised Clustering Techniques Based on Genetic Algorithms
title_fullStr Unsupervised Clustering Techniques Based on Genetic Algorithms
title_full_unstemmed Unsupervised Clustering Techniques Based on Genetic Algorithms
title_sort unsupervised clustering techniques based on genetic algorithms
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/82461999139557420974
work_keys_str_mv AT fuwenyang unsupervisedclusteringtechniquesbasedongeneticalgorithms
AT yángfùwén unsupervisedclusteringtechniquesbasedongeneticalgorithms
AT fuwenyang yǐjīyīnyǎnsuànfǎwèijīchǔzhīfēijiāndūshìfēnqúnjìshù
AT yángfùwén yǐjīyīnyǎnsuànfǎwèijīchǔzhīfēijiāndūshìfēnqúnjìshù
_version_ 1716851835112783872