On Clustering Histograms with k-Means by Using Mixed α-Divergences

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clust...

Full description

Bibliographic Details
Main Authors: Frank Nielsen, Richard Nock, Shun-ichi Amari
Format: Article
Language:English
Published: MDPI AG 2014-06-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/16/6/3273
Description
Summary:Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.
ISSN:1099-4300