Incorporating Name Entity Recognition Rules in News Topic Model

碩士 === 國立雲林科技大學 === 資訊管理系 === 104 === Unstructured information is growing rapidly. Topic models have been widely used to identify topics in unstructured corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a nu...

Full description

Bibliographic Details
Main Authors: HSIAO, WEI-CHING, 蕭維慶
Other Authors: HUANG, CHUEN-MIN
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/249v39
id ndltd-TW-104YUNT0396049
record_format oai_dc
spelling ndltd-TW-104YUNT03960492019-05-15T22:53:47Z http://ndltd.ncl.edu.tw/handle/249v39 Incorporating Name Entity Recognition Rules in News Topic Model 命名實體辨識規則應用於主題模型特徵詞萃取研究 HSIAO, WEI-CHING 蕭維慶 碩士 國立雲林科技大學 資訊管理系 104 Unstructured information is growing rapidly. Topic models have been widely used to identify topics in unstructured corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. The disadvantage of last knowledge-based topic model is the requirement that the user is well aware of this domain, but this does not meet the reality of the actual application. In most cases, people want to use the topic model to find the potential topic. Also, prior knowledge-based topic model is difficult to handle large amounts of data. This study use syntactic extraction rule to extract named entities as LDA feature terms and Coherence Measure, UMass Topic Coherence and efficiency testing as an evaluation method to compare with Unigram-LDA, Compound-LDA and Mixture LDA. The results show name entities LDA’s execution efficiency superior to others and the topic results is interpretable. Name entities LDA is lightly below the other LDA model on the Umass term measure. The Coherence Measure of Name entities LDA is 0.97. HUANG, CHUEN-MIN 黃純敏 2016 學位論文 ; thesis 58 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立雲林科技大學 === 資訊管理系 === 104 === Unstructured information is growing rapidly. Topic models have been widely used to identify topics in unstructured corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. The disadvantage of last knowledge-based topic model is the requirement that the user is well aware of this domain, but this does not meet the reality of the actual application. In most cases, people want to use the topic model to find the potential topic. Also, prior knowledge-based topic model is difficult to handle large amounts of data. This study use syntactic extraction rule to extract named entities as LDA feature terms and Coherence Measure, UMass Topic Coherence and efficiency testing as an evaluation method to compare with Unigram-LDA, Compound-LDA and Mixture LDA. The results show name entities LDA’s execution efficiency superior to others and the topic results is interpretable. Name entities LDA is lightly below the other LDA model on the Umass term measure. The Coherence Measure of Name entities LDA is 0.97.
author2 HUANG, CHUEN-MIN
author_facet HUANG, CHUEN-MIN
HSIAO, WEI-CHING
蕭維慶
author HSIAO, WEI-CHING
蕭維慶
spellingShingle HSIAO, WEI-CHING
蕭維慶
Incorporating Name Entity Recognition Rules in News Topic Model
author_sort HSIAO, WEI-CHING
title Incorporating Name Entity Recognition Rules in News Topic Model
title_short Incorporating Name Entity Recognition Rules in News Topic Model
title_full Incorporating Name Entity Recognition Rules in News Topic Model
title_fullStr Incorporating Name Entity Recognition Rules in News Topic Model
title_full_unstemmed Incorporating Name Entity Recognition Rules in News Topic Model
title_sort incorporating name entity recognition rules in news topic model
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/249v39
work_keys_str_mv AT hsiaoweiching incorporatingnameentityrecognitionrulesinnewstopicmodel
AT xiāowéiqìng incorporatingnameentityrecognitionrulesinnewstopicmodel
AT hsiaoweiching mìngmíngshítǐbiànshíguīzéyīngyòngyúzhǔtímóxíngtèzhēngcícuìqǔyánjiū
AT xiāowéiqìng mìngmíngshítǐbiànshíguīzéyīngyòngyúzhǔtímóxíngtèzhēngcícuìqǔyánjiū
_version_ 1719137057902690304