Fast Clustering Algorithms for Compact Data

博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data...

Full description

Bibliographic Details
Main Authors:	Tsung-Jen Huang, 黃崇仁
Other Authors:	Jim Z. C., Lai
Format:	Others
Language:	en_US
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/81593670983905576013

id	ndltd-TW-100NTOU5394002
record_format	oai_dc
spelling	ndltd-TW-100NTOU53940022015-10-13T22:01:07Z http://ndltd.ncl.edu.tw/handle/81593670983905576013 Fast Clustering Algorithms for Compact Data 緊密資料之快速分群演算法 Tsung-Jen Huang 黃崇仁博士國立臺灣海洋大學資訊工程學系 100 In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality. A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images. For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images. Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters. From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality. Jim Z. C., Lai 賴榮滄 2011 學位論文 ; thesis 90 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立臺灣海洋大學 === 資訊工程學系 === 100 === In this dissertation, three types of algorithms: partition-based clustering, hierarchical divisive clustering and hierarchical agglomerative clustering, are developed to speed up clustering for compact data. These methods exploit the relationships between data objects and cluster representatives to speed up clustering processes. Compared with available approaches, these proposed methods can reduce the computational complexity significantly and obtain the same clustering quality. A partition-based clustering method, “fast k-means clustering using center displacement,” is proposed to speed up k-means clustering and keep the same quality of clustering result by using the information of cluster center displacement between two successive processes. The experimental results show that compared with full search, the proposed partition-based method may reduce the computing time by a factor of 1.37 to 4.39 for the data set from six real images. For hierarchical divisive clustering, a fast global k-means algorithm: modified fast global k-means (MFGKM) is proposed to speed up global k-means clustering by making use of the cluster membership and geometrical information of a data point. The proposed method can obtain the least distortion of clustering. Compared to modified global k-means (MGKM), the proposed method MFGKM with T=2, where T is the number of repetitions of calculating the cluster center, can reduce the computing time and number of distance calculations by a factor of 1.31 to 1.75 and 25.78 to 45.31, respectively, for the data set generated from three images. Finally, for hierarchical agglomerative clustering, a new algorithm is developed to reduce the computational complexity of Ward’s method. Double Linked Algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm, which is a fast version of Ward’s method, by obtaining an approximate solution of hierarchical agglomerative clustering. The proposed approach dynamic k-nearest-neighbor algorithm (DKNNA) uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge and resolves the problem of a non-optimal solution by DLA. The proposed method is an exact version of Ward’s method and the DLA is not. From our experimental results, we can find that the proposed method, DKNNA, can obtain almost the same clustering result as that of the FPNN. For example, compared with DLA+FS, DKNNA+FS can decrease the average mean square errors by 0.72% using the data set generated from the image “Lena” with N=16384 and M = 256, where FS is the fast search algorithm for finding nearest neighbors, N is the number of data objects and M is the number of clusters. From the experiments, we can find that these proposed algorithms have the same characteristics that their performances are more remarkable when a larger data set is used. Note here that the partition-based clustering algorithms have the least computing time. However, hierarchical divisive clustering and hierarchical agglomerative clustering algorithms can get better clustering quality. The proposed algorithms are focused on reducing the computational complexity and retaining the same clustering quality.
author2	Jim Z. C., Lai
author_facet	Jim Z. C., Lai Tsung-Jen Huang 黃崇仁
author	Tsung-Jen Huang 黃崇仁
spellingShingle	Tsung-Jen Huang 黃崇仁 Fast Clustering Algorithms for Compact Data
author_sort	Tsung-Jen Huang
title	Fast Clustering Algorithms for Compact Data
title_short	Fast Clustering Algorithms for Compact Data
title_full	Fast Clustering Algorithms for Compact Data
title_fullStr	Fast Clustering Algorithms for Compact Data
title_full_unstemmed	Fast Clustering Algorithms for Compact Data
title_sort	fast clustering algorithms for compact data
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/81593670983905576013
work_keys_str_mv	AT tsungjenhuang fastclusteringalgorithmsforcompactdata AT huángchóngrén fastclusteringalgorithmsforcompactdata AT tsungjenhuang jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ AT huángchóngrén jǐnmìzīliàozhīkuàisùfēnqúnyǎnsuànfǎ
_version_	1718071620192436224

Fast Clustering Algorithms for Compact Data

Similar Items