ClickClust: An R Package for Model-Based Clustering of Categorical Sequences

The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by...

Full description

Bibliographic Details
Main Author: Volodymyr Melnykov
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2016-10-01
Series:Journal of Statistical Software
Subjects:
R
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/2897
id doaj-a8556f57a25049f0ab1bb8d49b2e3b60
record_format Article
spelling doaj-a8556f57a25049f0ab1bb8d49b2e3b602020-11-24T22:43:26ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602016-10-0174113410.18637/jss.v074.i091058ClickClust: An R Package for Model-Based Clustering of Categorical SequencesVolodymyr MelnykovThe R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples.https://www.jstatsoft.org/index.php/jss/article/view/2897categorical sequencesmodel-based cluster analysisfinite mixture modelsMarkov modelsbiclusteringclick-plotR
collection DOAJ
language English
format Article
sources DOAJ
author Volodymyr Melnykov
spellingShingle Volodymyr Melnykov
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
Journal of Statistical Software
categorical sequences
model-based cluster analysis
finite mixture models
Markov models
biclustering
click-plot
R
author_facet Volodymyr Melnykov
author_sort Volodymyr Melnykov
title ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
title_short ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
title_full ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
title_fullStr ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
title_full_unstemmed ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
title_sort clickclust: an r package for model-based clustering of categorical sequences
publisher Foundation for Open Access Statistics
series Journal of Statistical Software
issn 1548-7660
publishDate 2016-10-01
description The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples.
topic categorical sequences
model-based cluster analysis
finite mixture models
Markov models
biclustering
click-plot
R
url https://www.jstatsoft.org/index.php/jss/article/view/2897
work_keys_str_mv AT volodymyrmelnykov clickclustanrpackageformodelbasedclusteringofcategoricalsequences
_version_ 1725695981153943552