Fast Clustering Algorithms for Compact Data
博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/81593670983905576013 |
id |
ndltd-TW-100NTOU5394002 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTOU53940022015-10-13T22:01:07Z http://ndltd.ncl.edu.tw/handle/81593670983905576013 Fast Clustering Algorithms for Compact Data 緊密資料之快速分群演算法 Tsung-Jen Huang 黃崇仁 博士 國立臺灣海洋大學 資訊工程學系 100 In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality. A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images. For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images. Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters. From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality. Jim Z. C., Lai 賴榮滄 2011 學位論文 ; thesis 90 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality.
A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images.
For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images.
Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters.
From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality.
|
author2 |
Jim Z. C., Lai |
author_facet |
Jim Z. C., Lai Tsung-Jen Huang 黃崇仁 |
author |
Tsung-Jen Huang 黃崇仁 |
spellingShingle |
Tsung-Jen Huang 黃崇仁 Fast Clustering Algorithms for Compact Data |
author_sort |
Tsung-Jen Huang |
title |
Fast Clustering Algorithms for Compact Data |
title_short |
Fast Clustering Algorithms for Compact Data |
title_full |
Fast Clustering Algorithms for Compact Data |
title_fullStr |
Fast Clustering Algorithms for Compact Data |
title_full_unstemmed |
Fast Clustering Algorithms for Compact Data |
title_sort |
fast clustering algorithms for compact data |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/81593670983905576013 |
work_keys_str_mv |
AT tsungjenhuang fastclusteringalgorithmsforcompactdata AT huángchóngrén fastclusteringalgorithmsforcompactdata AT tsungjenhuang jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ AT huángchóngrén jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ |
_version_ |
1718071620192436224 |