Analysis and tuning of hierarchical topic models based on Renyi entropy approach

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an a...

Full description

Bibliographic Details
Main Authors: Sergei Koltcov, Vera Ignatenko, Maxim Terpilovskii, Paolo Rosso
Format: Article
Language:English
Published: PeerJ Inc. 2021-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-608.pdf
id doaj-804f85a0030b49e1bc7716227ac5e711
record_format Article
spelling doaj-804f85a0030b49e1bc7716227ac5e7112021-07-31T15:05:05ZengPeerJ Inc.PeerJ Computer Science2376-59922021-07-017e60810.7717/peerj-cs.608Analysis and tuning of hierarchical topic models based on Renyi entropy approachSergei Koltcov0Vera Ignatenko1Maxim Terpilovskii2Paolo Rosso3Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, RussiaLaboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, RussiaLaboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, RussiaLaboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, RussiaHierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.https://peerj.com/articles/cs-608.pdfTopic modelingRenyi entropyHierarchical topic modelsOptimal number of topics
collection DOAJ
language English
format Article
sources DOAJ
author Sergei Koltcov
Vera Ignatenko
Maxim Terpilovskii
Paolo Rosso
spellingShingle Sergei Koltcov
Vera Ignatenko
Maxim Terpilovskii
Paolo Rosso
Analysis and tuning of hierarchical topic models based on Renyi entropy approach
PeerJ Computer Science
Topic modeling
Renyi entropy
Hierarchical topic models
Optimal number of topics
author_facet Sergei Koltcov
Vera Ignatenko
Maxim Terpilovskii
Paolo Rosso
author_sort Sergei Koltcov
title Analysis and tuning of hierarchical topic models based on Renyi entropy approach
title_short Analysis and tuning of hierarchical topic models based on Renyi entropy approach
title_full Analysis and tuning of hierarchical topic models based on Renyi entropy approach
title_fullStr Analysis and tuning of hierarchical topic models based on Renyi entropy approach
title_full_unstemmed Analysis and tuning of hierarchical topic models based on Renyi entropy approach
title_sort analysis and tuning of hierarchical topic models based on renyi entropy approach
publisher PeerJ Inc.
series PeerJ Computer Science
issn 2376-5992
publishDate 2021-07-01
description Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.
topic Topic modeling
Renyi entropy
Hierarchical topic models
Optimal number of topics
url https://peerj.com/articles/cs-608.pdf
work_keys_str_mv AT sergeikoltcov analysisandtuningofhierarchicaltopicmodelsbasedonrenyientropyapproach
AT veraignatenko analysisandtuningofhierarchicaltopicmodelsbasedonrenyientropyapproach
AT maximterpilovskii analysisandtuningofhierarchicaltopicmodelsbasedonrenyientropyapproach
AT paolorosso analysisandtuningofhierarchicaltopicmodelsbasedonrenyientropyapproach
_version_ 1721246840897142784