Sparsification for Topic Modeling and Applications to Information Retrieval

Bibliographic Details
Main Author: Muoh, Chibuike
Language:English
Published: Kent State University / OhioLINK 2009
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719
id ndltd-OhioLink-oai-etd.ohiolink.edu-kent1259206719
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-kent12592067192021-08-03T05:37:07Z Sparsification for Topic Modeling and Applications to Information Retrieval Muoh, Chibuike Computer Science PLSA information retrieval clustering topic model In this thesis we tackle the problem of improving the probabilistic topic model used in information retrieval and text mining tasks to discovering the latent semantic structure within a corpus. Probabilistic topic models such as PLSA use only the statistical inferences from the corpus count-data to reconstruct the underlying topic structure. However, we observe that the baseline PLSA topic model suffers from an over-fitting problem which can affect the modeling accuracy. The techniques we outline in this thesis thus aim to produce a more selective generative model than the baseline while still retaining the important underlying topic structure of a corpus. We propose two new algorithms: L0-optimization is a post-processing sparsification approach for language modeling using information theoreticmeasures. And the L2-sparsification approach aims to reformulate the document likelihood equation to simultaneously remove spurious parameters from the model. 2009-11-30 English text Kent State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
PLSA
information retrieval
clustering
topic model
spellingShingle Computer Science
PLSA
information retrieval
clustering
topic model
Muoh, Chibuike
Sparsification for Topic Modeling and Applications to Information Retrieval
author Muoh, Chibuike
author_facet Muoh, Chibuike
author_sort Muoh, Chibuike
title Sparsification for Topic Modeling and Applications to Information Retrieval
title_short Sparsification for Topic Modeling and Applications to Information Retrieval
title_full Sparsification for Topic Modeling and Applications to Information Retrieval
title_fullStr Sparsification for Topic Modeling and Applications to Information Retrieval
title_full_unstemmed Sparsification for Topic Modeling and Applications to Information Retrieval
title_sort sparsification for topic modeling and applications to information retrieval
publisher Kent State University / OhioLINK
publishDate 2009
url http://rave.ohiolink.edu/etdc/view?acc_num=kent1259206719
work_keys_str_mv AT muohchibuike sparsificationfortopicmodelingandapplicationstoinformationretrieval
_version_ 1719422576855351296