A Theoretical Study of Clusterability and Clustering Quality

Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building t...

Full description

Bibliographic Details
Main Author: Ackerman, Margareta
Language:en
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10012/3478
id ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-3478
record_format oai_dc
spelling ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-34782013-01-08T18:50:59ZAckerman, Margareta2008-01-16T15:36:19Z2008-01-16T15:36:19Z2008-01-16T15:36:19Z2007http://hdl.handle.net/10012/3478Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building theoretical foundations for clustering by carrying out an abstract analysis of two central concepts in clustering; clusterability and clustering quality. We compare a number of notions of clusterability found in the literature. While all these notions attempt to measure the same property, and all appear to be reasonable, we show that they are pairwise inconsistent. In addition, we give the first computational complexity analysis of a few notions of clusterability. In the second part of the thesis, we discuss how the quality of a given clustering can be defined (and measured). Users often need to compare the quality of clusterings obtained by different methods. Perhaps more importantly, users need to determine whether a given clustering is sufficiently good for being used in further data mining analysis. We analyze what a measure of clustering quality should look like. We do that by introducing a set of requirements (`axioms') of clustering quality measures. We propose a number of clustering quality measures that satisfy these requirements.enclusteringclustering qualityclusterabilitydata miningA Theoretical Study of Clusterability and Clustering QualityThesis or DissertationSchool of Computer ScienceMaster of MathematicsComputer Science
collection NDLTD
language en
sources NDLTD
topic clustering
clustering quality
clusterability
data mining
Computer Science
spellingShingle clustering
clustering quality
clusterability
data mining
Computer Science
Ackerman, Margareta
A Theoretical Study of Clusterability and Clustering Quality
description Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building theoretical foundations for clustering by carrying out an abstract analysis of two central concepts in clustering; clusterability and clustering quality. We compare a number of notions of clusterability found in the literature. While all these notions attempt to measure the same property, and all appear to be reasonable, we show that they are pairwise inconsistent. In addition, we give the first computational complexity analysis of a few notions of clusterability. In the second part of the thesis, we discuss how the quality of a given clustering can be defined (and measured). Users often need to compare the quality of clusterings obtained by different methods. Perhaps more importantly, users need to determine whether a given clustering is sufficiently good for being used in further data mining analysis. We analyze what a measure of clustering quality should look like. We do that by introducing a set of requirements (`axioms') of clustering quality measures. We propose a number of clustering quality measures that satisfy these requirements.
author Ackerman, Margareta
author_facet Ackerman, Margareta
author_sort Ackerman, Margareta
title A Theoretical Study of Clusterability and Clustering Quality
title_short A Theoretical Study of Clusterability and Clustering Quality
title_full A Theoretical Study of Clusterability and Clustering Quality
title_fullStr A Theoretical Study of Clusterability and Clustering Quality
title_full_unstemmed A Theoretical Study of Clusterability and Clustering Quality
title_sort theoretical study of clusterability and clustering quality
publishDate 2008
url http://hdl.handle.net/10012/3478
work_keys_str_mv AT ackermanmargareta atheoreticalstudyofclusterabilityandclusteringquality
AT ackermanmargareta theoreticalstudyofclusterabilityandclusteringquality
_version_ 1716573049289965568