Sparsification for Topic Modeling and Applications to Information Retrieval
Main Author: | |
---|---|
Language: | English |
Published: |
Kent State University / OhioLINK
2009
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-kent1259206719 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-kent12592067192021-08-03T05:37:07Z Sparsification for Topic Modeling and Applications to Information Retrieval Muoh, Chibuike Computer Science PLSA information retrieval clustering topic model In this thesis we tackle the problem of improving the probabilistic topic model used in information retrieval and text mining tasks to discovering the latent semantic structure within a corpus. Probabilistic topic models such as PLSA use only the statistical inferences from the corpus count-data to reconstruct the underlying topic structure. However, we observe that the baseline PLSA topic model suffers from an over-fitting problem which can affect the modeling accuracy. The techniques we outline in this thesis thus aim to produce a more selective generative model than the baseline while still retaining the important underlying topic structure of a corpus. We propose two new algorithms: L0-optimization is a post-processing sparsification approach for language modeling using information theoreticmeasures. And the L2-sparsification approach aims to reformulate the document likelihood equation to simultaneously remove spurious parameters from the model. 2009-11-30 English text Kent State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science PLSA information retrieval clustering topic model |
spellingShingle |
Computer Science PLSA information retrieval clustering topic model Muoh, Chibuike Sparsification for Topic Modeling and Applications to Information Retrieval |
author |
Muoh, Chibuike |
author_facet |
Muoh, Chibuike |
author_sort |
Muoh, Chibuike |
title |
Sparsification for Topic Modeling and Applications to Information Retrieval |
title_short |
Sparsification for Topic Modeling and Applications to Information Retrieval |
title_full |
Sparsification for Topic Modeling and Applications to Information Retrieval |
title_fullStr |
Sparsification for Topic Modeling and Applications to Information Retrieval |
title_full_unstemmed |
Sparsification for Topic Modeling and Applications to Information Retrieval |
title_sort |
sparsification for topic modeling and applications to information retrieval |
publisher |
Kent State University / OhioLINK |
publishDate |
2009 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 |
work_keys_str_mv |
AT muohchibuike sparsificationfortopicmodelingandapplicationstoinformationretrieval |
_version_ |
1719422576855351296 |