Fast Clustering Algorithms for Compact Data

博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data...

Full description

Bibliographic Details
Main Authors: Tsung-Jen Huang, 黃崇仁
Other Authors: Jim Z. C., Lai
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/81593670983905576013
id ndltd-TW-100NTOU5394002
record_format oai_dc
spelling ndltd-TW-100NTOU53940022015-10-13T22:01:07Z http://ndltd.ncl.edu.tw/handle/81593670983905576013 Fast Clustering Algorithms for Compact Data 緊密資料之快速分群演算法 Tsung-Jen Huang 黃崇仁 博士 國立臺灣海洋大學 資訊工程學系 100 In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality. A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images. For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images. Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters. From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality. Jim Z. C., Lai 賴榮滄 2011 學位論文 ; thesis 90 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality. A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images. For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images. Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters. From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality.
author2 Jim Z. C., Lai
author_facet Jim Z. C., Lai
Tsung-Jen Huang
黃崇仁
author Tsung-Jen Huang
黃崇仁
spellingShingle Tsung-Jen Huang
黃崇仁
Fast Clustering Algorithms for Compact Data
author_sort Tsung-Jen Huang
title Fast Clustering Algorithms for Compact Data
title_short Fast Clustering Algorithms for Compact Data
title_full Fast Clustering Algorithms for Compact Data
title_fullStr Fast Clustering Algorithms for Compact Data
title_full_unstemmed Fast Clustering Algorithms for Compact Data
title_sort fast clustering algorithms for compact data
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/81593670983905576013
work_keys_str_mv AT tsungjenhuang fastclusteringalgorithmsforcompactdata
AT huángchóngrén fastclusteringalgorithmsforcompactdata
AT tsungjenhuang jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ
AT huángchóngrén jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ
_version_ 1718071620192436224