Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature
碩士 === 國立中興大學 === 資訊科學與工程學系 === 102 === In the decision making process, people usually need to collect the related information and topics regarding their problems as soon as possible. Due to the rapid development of the Web, the amount of data grows quickly, making it difficult to classify and to f...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/27386888925155926267 |
id |
ndltd-TW-102NCHU5394063 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NCHU53940632017-08-27T04:29:50Z http://ndltd.ncl.edu.tw/handle/27386888925155926267 Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature 基於隨機漫步與詞類權重加權之文件主題偵測與歷程分析模型 Chi-Hao Chen 陳期皓 碩士 國立中興大學 資訊科學與工程學系 102 In the decision making process, people usually need to collect the related information and topics regarding their problems as soon as possible. Due to the rapid development of the Web, the amount of data grows quickly, making it difficult to classify and to find the related topic/event manually within the desired time. In this thesis, we proposed a topic detection and topic evolution modeling system based on random-walk with restart (RWR) and named-entity feature. This system can automatically discover the topics discussed within the corpus and generate the topic/event evolution information with respect to the temporal and content similarity between topics/events. The proposed system uses the information from Wikipedia’s web pages to categorize document terms into five predefined named-entity classes and assigns different weights to those classes in order to generate more distinctive features of each document. After extracting features of each document, a graph generation module computes graph-based similarity for text documents using cosine similarity. The RWR-based clustering algorithm is then used to detect the topics/events within those documents by aggregating documents under the same topics. Finally, the proposed system generates the topic/event evolution information of given documents by considering the similarity of average features of contained documents, the similarity of representative document features, and temporal similarity between topics. These topic evolution information can be used to support the decision making process. The experimental results showed that the proposed algorithm indeed outperforms both K-Means and Latent Dirichlet Allocation (LDA) in terms of F1-measure on real world data. In addition, the RWR-based clustering algorithm provides better topic detection quality than other approaches in terms of the ability of handling multi-topic documents. The experimental results also showed that the proposed model can generate suitable topic/event evolution information. I-En Liao 廖宜恩 2014 學位論文 ; thesis 47 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中興大學 === 資訊科學與工程學系 === 102 === In the decision making process, people usually need to collect the related information and topics regarding their problems as soon as possible. Due to the rapid development of the Web, the amount of data grows quickly, making it difficult to classify and to find the related topic/event manually within the desired time. In this thesis, we proposed a topic detection and topic evolution modeling system based on random-walk with restart (RWR) and named-entity feature. This system can automatically discover the topics discussed within the corpus and generate the topic/event evolution information with respect to the temporal and content similarity between topics/events. The proposed system uses the information from Wikipedia’s web pages to categorize document terms into five predefined named-entity classes and assigns different weights to those classes in order to generate more distinctive features of each document. After extracting features of each document, a graph generation module computes graph-based similarity for text documents using cosine similarity. The RWR-based clustering algorithm is then used to detect the topics/events within those documents by aggregating documents under the same topics. Finally, the proposed system generates the topic/event evolution information of given documents by considering the similarity of average features of contained documents, the similarity of representative document features, and temporal similarity between topics. These topic evolution information can be used to support the decision making process. The experimental results showed that the proposed algorithm indeed outperforms both K-Means and Latent Dirichlet Allocation (LDA) in terms of F1-measure on real world data. In addition, the RWR-based clustering algorithm provides better topic detection quality than other approaches in terms of the ability of handling multi-topic documents. The experimental results also showed that the proposed model can generate suitable topic/event evolution information.
|
author2 |
I-En Liao |
author_facet |
I-En Liao Chi-Hao Chen 陳期皓 |
author |
Chi-Hao Chen 陳期皓 |
spellingShingle |
Chi-Hao Chen 陳期皓 Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
author_sort |
Chi-Hao Chen |
title |
Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
title_short |
Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
title_full |
Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
title_fullStr |
Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
title_full_unstemmed |
Discovering Topics and Their Evolution in CorpusBased on Random-Walk with Restart andNamed-Entity Feature |
title_sort |
discovering topics and their evolution in corpusbased on random-walk with restart andnamed-entity feature |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/27386888925155926267 |
work_keys_str_mv |
AT chihaochen discoveringtopicsandtheirevolutionincorpusbasedonrandomwalkwithrestartandnamedentityfeature AT chénqīhào discoveringtopicsandtheirevolutionincorpusbasedonrandomwalkwithrestartandnamedentityfeature AT chihaochen jīyúsuíjīmànbùyǔcílèiquánzhòngjiāquánzhīwénjiànzhǔtízhēncèyǔlìchéngfēnxīmóxíng AT chénqīhào jīyúsuíjīmànbùyǔcílèiquánzhòngjiāquánzhīwénjiànzhǔtízhēncèyǔlìchéngfēnxīmóxíng |
_version_ |
1718518961024270336 |