Summary: | Probabilistic topic models are a versatile class of models for discovering latent themes in document collections through unsupervised learning. Conventional inferential methods lack the scaling capabilities necessary for extensions to large-scale applications. In recent years Stochastic Expectation Maximization has proven scalable for the simplest topic model: Latent Dirichlet Allocation. Performing analytical maximization is unfortunately not possible for many more complex topic models. With the rise of probabilistic programming languages, the ability to infer flexibly specified probabilistic models using sophisticated numerical optimization procedures has become widely available. These frameworks have however mainly been developed for optimization of continuous parameters, often prohibiting direct optimization of discrete parameters. This thesis explores the potential of utilizing probabilistic programming for generic topic modeling using Stochastic Expectation Maximization with numerical maximization of discrete parameters reparameterized to unconstrained space. The method achieves results of similar quality as other methods for Latent Dirichlet Allocation in simulated experiments. Further application is made to infer a Dirichlet-multinomial Regression model with metadata covariates. A real dataset is used and the method produces interpretable topics.
|