ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2016-10-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | https://www.jstatsoft.org/index.php/jss/article/view/2897 |
id |
doaj-a8556f57a25049f0ab1bb8d49b2e3b60 |
---|---|
record_format |
Article |
spelling |
doaj-a8556f57a25049f0ab1bb8d49b2e3b602020-11-24T22:43:26ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602016-10-0174113410.18637/jss.v074.i091058ClickClust: An R Package for Model-Based Clustering of Categorical SequencesVolodymyr MelnykovThe R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples.https://www.jstatsoft.org/index.php/jss/article/view/2897categorical sequencesmodel-based cluster analysisfinite mixture modelsMarkov modelsbiclusteringclick-plotR |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Volodymyr Melnykov |
spellingShingle |
Volodymyr Melnykov ClickClust: An R Package for Model-Based Clustering of Categorical Sequences Journal of Statistical Software categorical sequences model-based cluster analysis finite mixture models Markov models biclustering click-plot R |
author_facet |
Volodymyr Melnykov |
author_sort |
Volodymyr Melnykov |
title |
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences |
title_short |
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences |
title_full |
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences |
title_fullStr |
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences |
title_full_unstemmed |
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences |
title_sort |
clickclust: an r package for model-based clustering of categorical sequences |
publisher |
Foundation for Open Access Statistics |
series |
Journal of Statistical Software |
issn |
1548-7660 |
publishDate |
2016-10-01 |
description |
The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples. |
topic |
categorical sequences model-based cluster analysis finite mixture models Markov models biclustering click-plot R |
url |
https://www.jstatsoft.org/index.php/jss/article/view/2897 |
work_keys_str_mv |
AT volodymyrmelnykov clickclustanrpackageformodelbasedclusteringofcategoricalsequences |
_version_ |
1725695981153943552 |