High-Dimensional Non-Gaussian Data Clustering using Variational Learning of Mixture Models

Clustering has been the topic of extensive research in the past. The main concern is to automatically divide a given data set into different clusters such that vectors of the same cluster are as similar as possible and vectors of different clusters are as different as possible. Finite mixture models...

Full description

Bibliographic Details
Main Author: Fan, Wentao
Format: Others
Published: 2013
Online Access:http://spectrum.library.concordia.ca/978077/1/Fan_PhD_S2014.pdf
Fan, Wentao <http://spectrum.library.concordia.ca/view/creators/Fan=3AWentao=3A=3A.html> (2013) High-Dimensional Non-Gaussian Data Clustering using Variational Learning of Mixture Models. PhD thesis, Concordia University.
Description
Summary:Clustering has been the topic of extensive research in the past. The main concern is to automatically divide a given data set into different clusters such that vectors of the same cluster are as similar as possible and vectors of different clusters are as different as possible. Finite mixture models have been widely used for clustering since they have the advantages of being able to integrate prior knowledge about the data and to address the problem of unsupervised learning in a formal way. A crucial starting point when adopting mixture models is the choice of the components densities. In this context, the well-known Gaussian distribution has been widely used. However, the deployment of the Gaussian mixture implies implicitly clustering based on the minimization of Euclidean distortions which may yield to poor results in several real applications where the per-components densities are not Gaussian. Recent works have shown that other models such as the Dirichlet, generalized Dirichlet and Beta-Liouville mixtures may provide better clustering results in applications containing non-Gaussian data, especially those involving proportional data (or normalized histograms) which are naturally generated by many applications. Two other challenging aspects that should also be addressed when considering mixture models are: how to determine the model's complexity (i.e. the number of mixture components) and how to estimate the model's parameters. Fortunately, both problems can be tackled simultaneously within a principled elegant learning framework namely variational inference. The main idea of variational inference is to approximate the model posterior distribution by minimizing the Kullback-Leibler divergence between the exact (or true) posterior and an approximating distribution. Recently, variational inference has provided good generalization performance and computational tractability in many applications including learning mixture models. In this thesis, we propose several approaches for high-dimensional non-Gaussian data clustering based on various mixture models such as Dirichlet, generalized Dirichlet and Beta-Liouville. These mixture models are learned using variational inference which main advantages are computational efficiency and guaranteed convergence. More specifically, our contributions are four-fold. Firstly, we develop a variational inference algorithm for learning the finite Dirichlet mixture model, where model parameters and the model complexity can be determined automatically and simultaneously as part of the Bayesian inference procedure; Secondly, an unsupervised feature selection scheme is integrated with finite generalized Dirichlet mixture model for clustering high-dimensional non-Gaussian data; Thirdly, we extend the proposed finite generalized mixture model to the infinite case using a nonparametric Bayesian framework known as Dirichlet process, so that the difficulty of choosing the appropriate number of clusters is sidestepped by assuming that there are an infinite number of mixture components; Finally, we propose an online learning framework to learn a Dirichlet process mixture of Beta-Liouville distributions (i.e. an infinite Beta-Liouville mixture model), which is more suitable when dealing with sequential or large scale data in contrast to batch learning algorithm. The effectiveness of our approaches is evaluated using both synthetic and real-life challenging applications such as image databases categorization, anomaly intrusion detection, human action videos categorization, image annotation, facial expression recognition, behavior recognition, and dynamic textures clustering.