Fine-Grained Topic Models Using Anchor Words

Topic modeling is an effective tool for analyzing the thematic content of large collections of text. However, traditional probabilistic topic modeling is limited to a small number of topics (typically no more than hundreds). We introduce fine-grained topic models, which have large numbers of nuanced...

Full description

Bibliographic Details
Main Author: Lund, Jeffrey A.
Format: Others
Published: BYU ScholarsArchive 2018
Subjects:
Online Access:https://scholarsarchive.byu.edu/etd/7559
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=8559&context=etd
id ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-8559
record_format oai_dc
spelling ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-85592021-09-12T05:01:38Z Fine-Grained Topic Models Using Anchor Words Lund, Jeffrey A. Topic modeling is an effective tool for analyzing the thematic content of large collections of text. However, traditional probabilistic topic modeling is limited to a small number of topics (typically no more than hundreds). We introduce fine-grained topic models, which have large numbers of nuanced and specific topics. We demonstrate that fine-grained topic models enable use cases not currently possible with current topic modeling techniques, including an automatic cross-referencing task in which short passages of text are linked to other topically related passages. We do so by leveraging anchor methods, a recent class of topic model based on non-negative matrix factorization in which each topic is anchored by a single word. We explore extensions of the anchor algorithm, including tandem anchors, which relaxes the restriction that anchors be formed of single words. By doing so, we are able to produce anchor-based topic models with thousands of fine-grained topics. We also develop metrics for evaluating token level topic assignments and use those metrics to improve the accuracy of fine-grained topic models. 2018-12-20T08:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/7559 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=8559&context=etd http://lib.byu.edu/about/copyright Theses and Dissertations BYU ScholarsArchive Topic Modeling Anchor Words Cross-reference Generation Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic Topic Modeling
Anchor Words
Cross-reference Generation
Computer Sciences
spellingShingle Topic Modeling
Anchor Words
Cross-reference Generation
Computer Sciences
Lund, Jeffrey A.
Fine-Grained Topic Models Using Anchor Words
description Topic modeling is an effective tool for analyzing the thematic content of large collections of text. However, traditional probabilistic topic modeling is limited to a small number of topics (typically no more than hundreds). We introduce fine-grained topic models, which have large numbers of nuanced and specific topics. We demonstrate that fine-grained topic models enable use cases not currently possible with current topic modeling techniques, including an automatic cross-referencing task in which short passages of text are linked to other topically related passages. We do so by leveraging anchor methods, a recent class of topic model based on non-negative matrix factorization in which each topic is anchored by a single word. We explore extensions of the anchor algorithm, including tandem anchors, which relaxes the restriction that anchors be formed of single words. By doing so, we are able to produce anchor-based topic models with thousands of fine-grained topics. We also develop metrics for evaluating token level topic assignments and use those metrics to improve the accuracy of fine-grained topic models.
author Lund, Jeffrey A.
author_facet Lund, Jeffrey A.
author_sort Lund, Jeffrey A.
title Fine-Grained Topic Models Using Anchor Words
title_short Fine-Grained Topic Models Using Anchor Words
title_full Fine-Grained Topic Models Using Anchor Words
title_fullStr Fine-Grained Topic Models Using Anchor Words
title_full_unstemmed Fine-Grained Topic Models Using Anchor Words
title_sort fine-grained topic models using anchor words
publisher BYU ScholarsArchive
publishDate 2018
url https://scholarsarchive.byu.edu/etd/7559
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=8559&context=etd
work_keys_str_mv AT lundjeffreya finegrainedtopicmodelsusinganchorwords
_version_ 1719480452303028224