On Clustering Histograms with k-Means by Using Mixed α-Divergences

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clust...

Full description

Bibliographic Details
Main Authors: Frank Nielsen, Richard Nock, Shun-ichi Amari
Format: Article
Language:English
Published: MDPI AG 2014-06-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/16/6/3273
id doaj-67bab809f23a4f98ae31c50a1cd6b3e6
record_format Article
spelling doaj-67bab809f23a4f98ae31c50a1cd6b3e62020-11-25T00:03:34ZengMDPI AGEntropy1099-43002014-06-011663273330110.3390/e16063273e16063273On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen0Richard Nock1Shun-ichi Amari2Sony Computer Science Laboratories, Inc, Tokyo 141-0022, JapanNICTA and The Australian National University, Locked Bag 9013, Alexandria NSW 1435, AustraliaRIKEN Brain Science Institute, 2-1 Hirosawa Wako City, Saitama 351-0198, JapanClustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.http://www.mdpi.com/1099-4300/16/6/3273bag-of-Xα-divergenceJeffreys divergencecentroidk-means clusteringk-means seeding
collection DOAJ
language English
format Article
sources DOAJ
author Frank Nielsen
Richard Nock
Shun-ichi Amari
spellingShingle Frank Nielsen
Richard Nock
Shun-ichi Amari
On Clustering Histograms with k-Means by Using Mixed α-Divergences
Entropy
bag-of-X
α-divergence
Jeffreys divergence
centroid
k-means clustering
k-means seeding
author_facet Frank Nielsen
Richard Nock
Shun-ichi Amari
author_sort Frank Nielsen
title On Clustering Histograms with k-Means by Using Mixed α-Divergences
title_short On Clustering Histograms with k-Means by Using Mixed α-Divergences
title_full On Clustering Histograms with k-Means by Using Mixed α-Divergences
title_fullStr On Clustering Histograms with k-Means by Using Mixed α-Divergences
title_full_unstemmed On Clustering Histograms with k-Means by Using Mixed α-Divergences
title_sort on clustering histograms with k-means by using mixed α-divergences
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2014-06-01
description Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.
topic bag-of-X
α-divergence
Jeffreys divergence
centroid
k-means clustering
k-means seeding
url http://www.mdpi.com/1099-4300/16/6/3273
work_keys_str_mv AT franknielsen onclusteringhistogramswithkmeansbyusingmixedadivergences
AT richardnock onclusteringhistogramswithkmeansbyusingmixedadivergences
AT shunichiamari onclusteringhistogramswithkmeansbyusingmixedadivergences
_version_ 1725433199744516096