Stochastic EM for generic topic modeling using probabilistic programming

Probabilistic topic models are a versatile class of models for discovering latent themes in document collections through unsupervised learning. Conventional inferential methods lack the scaling capabilities necessary for extensions to large-scale applications. In recent years Stochastic Expectation...

Full description

Bibliographic Details
Main Author: Saberi Nasseri, Robin
Format: Others
Language:English
Published: Uppsala universitet, Statistiska institutionen 2021
Subjects:
SEM
LDA
DMR
TFP
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447568
Description
Summary:Probabilistic topic models are a versatile class of models for discovering latent themes in document collections through unsupervised learning. Conventional inferential methods lack the scaling capabilities necessary for extensions to large-scale applications. In recent years Stochastic Expectation Maximization has proven scalable for the simplest topic model: Latent Dirichlet Allocation. Performing analytical maximization is unfortunately not possible for many more complex topic models. With the rise of probabilistic programming languages, the ability to infer flexibly specified probabilistic models using sophisticated numerical optimization procedures has become widely available. These frameworks have however mainly been developed for optimization of continuous parameters, often prohibiting direct optimization of discrete parameters. This thesis explores the potential of utilizing probabilistic programming for generic topic modeling using Stochastic Expectation Maximization with numerical maximization of discrete parameters reparameterized to unconstrained space. The method achieves results of similar quality as other methods for Latent Dirichlet Allocation in simulated experiments. Further application is made to infer a Dirichlet-multinomial Regression model with metadata covariates. A real dataset is used and the method produces interpretable topics.