Spectral clustering: An explorative study of proximity measures

In cluster analysis, data are clustered into meaningful groups so that the objects in the same group are very similar, and the objects residing in two different groups are different from one another. One such cluster analysis algorithm is called the spectral clustering algorithm, which originated fr...

Full description

Bibliographic Details
Main Author:	Azam, Nadia Farhanaz
Format:	Others
Language:	en
Published:	University of Ottawa (Canada) 2013
Subjects:	Computer Science.
Online Access:	http://hdl.handle.net/10393/28238 http://dx.doi.org/10.20381/ruor-19150

id	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-28238
record_format	oai_dc
spelling	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-282382018-01-05T19:07:54Z Spectral clustering: An explorative study of proximity measures Azam, Nadia Farhanaz Computer Science. In cluster analysis, data are clustered into meaningful groups so that the objects in the same group are very similar, and the objects residing in two different groups are different from one another. One such cluster analysis algorithm is called the spectral clustering algorithm, which originated from the area of graph partitioning. The input, in this case, is a similarity matrix, constructed from the pair-wise similarity between data objects. The algorithm uses the eigenvalues and eigenvectors of a normalized similarity matrix to partition the data. The pair-wise similarity between the objects is calculated from the proximity (e.g. similarity or distance) measures. In any clustering task, the proximity measures often play a crucial role. In fact, one of the early and fundamental steps in a clustering process is the selection of a suitable proximity measure. A number of such measures may be used for this task. However, the success of a clustering algorithm partially depends on the selection of the proximity measure. While, the majority of prior research on the spectral clustering algorithm emphasizes on the algorithm-specific issues, little research has been performed on the evaluation of the performance of the proximity measures. To this end, we perform a comparative and exploratory analysis on several existing proximity measures to evaluate their performance when applying the spectral clustering algorithm to a number of diverse data sets. To accomplish this task, we use a ten-fold cross validation technique, and assess the clustering results using several external cluster evaluation measures. The performances of the proximity measures are then compared using the quantitative results from the external evaluation measures and analyzed further to determine the probable causes that may have led to such results. In essence, our experimental evaluation indicates that the proximity measures, in general, yield comparable results. That is, no measure is clearly superior, or inferior, to the others in its group. However, among the six similarity measures considered for the binary data, one measure (Russell and Roo similarity coefficient) frequently performed poorer than the others. For numeric data, our study shows that the distance measures based on the relative distances (i.e. the Pearson correlation coefficient and the Angular distance) generally performed better than the distance measures based on the absolute distances (e.g. the Euclidean or Manhattan distance). When considering the proximity measures for mixed data, our results indicate that the choice of distance measure for the numeric data has the highest impact on the final outcome. 2013-11-07T19:04:05Z 2013-11-07T19:04:05Z 2009 2009 Thesis Source: Masters Abstracts International, Volume: 48-05, page: 3035. http://hdl.handle.net/10393/28238 http://dx.doi.org/10.20381/ruor-19150 en 196 p. University of Ottawa (Canada)
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Computer Science.
spellingShingle	Computer Science. Azam, Nadia Farhanaz Spectral clustering: An explorative study of proximity measures
description	In cluster analysis, data are clustered into meaningful groups so that the objects in the same group are very similar, and the objects residing in two different groups are different from one another. One such cluster analysis algorithm is called the spectral clustering algorithm, which originated from the area of graph partitioning. The input, in this case, is a similarity matrix, constructed from the pair-wise similarity between data objects. The algorithm uses the eigenvalues and eigenvectors of a normalized similarity matrix to partition the data. The pair-wise similarity between the objects is calculated from the proximity (e.g. similarity or distance) measures. In any clustering task, the proximity measures often play a crucial role. In fact, one of the early and fundamental steps in a clustering process is the selection of a suitable proximity measure. A number of such measures may be used for this task. However, the success of a clustering algorithm partially depends on the selection of the proximity measure. While, the majority of prior research on the spectral clustering algorithm emphasizes on the algorithm-specific issues, little research has been performed on the evaluation of the performance of the proximity measures. To this end, we perform a comparative and exploratory analysis on several existing proximity measures to evaluate their performance when applying the spectral clustering algorithm to a number of diverse data sets. To accomplish this task, we use a ten-fold cross validation technique, and assess the clustering results using several external cluster evaluation measures. The performances of the proximity measures are then compared using the quantitative results from the external evaluation measures and analyzed further to determine the probable causes that may have led to such results. In essence, our experimental evaluation indicates that the proximity measures, in general, yield comparable results. That is, no measure is clearly superior, or inferior, to the others in its group. However, among the six similarity measures considered for the binary data, one measure (Russell and Roo similarity coefficient) frequently performed poorer than the others. For numeric data, our study shows that the distance measures based on the relative distances (i.e. the Pearson correlation coefficient and the Angular distance) generally performed better than the distance measures based on the absolute distances (e.g. the Euclidean or Manhattan distance). When considering the proximity measures for mixed data, our results indicate that the choice of distance measure for the numeric data has the highest impact on the final outcome.
author	Azam, Nadia Farhanaz
author_facet	Azam, Nadia Farhanaz
author_sort	Azam, Nadia Farhanaz
title	Spectral clustering: An explorative study of proximity measures
title_short	Spectral clustering: An explorative study of proximity measures
title_full	Spectral clustering: An explorative study of proximity measures
title_fullStr	Spectral clustering: An explorative study of proximity measures
title_full_unstemmed	Spectral clustering: An explorative study of proximity measures
title_sort	spectral clustering: an explorative study of proximity measures
publisher	University of Ottawa (Canada)
publishDate	2013
url	http://hdl.handle.net/10393/28238 http://dx.doi.org/10.20381/ruor-19150
work_keys_str_mv	AT azamnadiafarhanaz spectralclusteringanexplorativestudyofproximitymeasures
_version_	1718602549092679680

Spectral clustering: An explorative study of proximity measures

Similar Items