The Application of Keywords Extraction
碩士 === 國立政治大學 === 統計學系 === 107 === Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/uw62va |
id |
ndltd-TW-107NCCU5337019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCCU53370192019-11-28T05:23:26Z http://ndltd.ncl.edu.tw/handle/uw62va The Application of Keywords Extraction 關鍵詞偵測方法的比較與應用 Hsu, Cheng-En 許承恩 碩士 國立政治大學 統計學系 107 Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help of human experts. This greatly increases the efficiency of reading a hugh amount of texts provided that the texts are properly structurized. The structurization of texts includes quite a few steps, such as keyword extraction and sentiment analysis. The keyword extraction is critical and the keywords can be used to summarize an article and compare two authors’ writing styles. The goal of this study is to propose a new unsupervised method for extracting keywords and compare it to some frequently used methods, including term frequency inverse document frequency (TF-IDF), logistic regression, machine learning models. In the empirical analysis, we considered three modern Chinese texts, one from People’s Daily (514 articles in 1971-1989) and two from New Youth Magazine (volumes 7 and 8 in 1919-1920). The numbers of words in all texts are approximately 400,000 to 600,000. We asked historical scholars to pick up keywords from these three texts and treat them as the true keywords. Then, we applied different keyword extraction methods to these texts and compared their results. We found that the proposed method has the best performance among all supervised methods and it is competitive to the supervised methods. Yue, Ching-Syang Cheng, Wen-Huei 余清祥 鄭文惠 2019 學位論文 ; thesis 57 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 統計學系 === 107 === Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help of human experts. This greatly increases the efficiency of reading a hugh amount of texts provided that the texts are properly structurized. The structurization of texts includes quite a few steps, such as keyword extraction and sentiment analysis. The keyword extraction is critical and the keywords can be used to summarize an article and compare two authors’ writing styles.
The goal of this study is to propose a new unsupervised method for extracting keywords and compare it to some frequently used methods, including term frequency inverse document frequency (TF-IDF), logistic regression, machine learning models. In the empirical analysis, we considered three modern Chinese texts, one from People’s Daily (514 articles in 1971-1989) and two from New Youth Magazine (volumes 7 and 8 in 1919-1920). The numbers of words in all texts are approximately 400,000 to 600,000. We asked historical scholars to pick up keywords from these three texts and treat them as the true keywords. Then, we applied different keyword extraction methods to these texts and compared their results. We found that the proposed method has the best performance among all supervised methods and it is competitive to the supervised methods.
|
author2 |
Yue, Ching-Syang |
author_facet |
Yue, Ching-Syang Hsu, Cheng-En 許承恩 |
author |
Hsu, Cheng-En 許承恩 |
spellingShingle |
Hsu, Cheng-En 許承恩 The Application of Keywords Extraction |
author_sort |
Hsu, Cheng-En |
title |
The Application of Keywords Extraction |
title_short |
The Application of Keywords Extraction |
title_full |
The Application of Keywords Extraction |
title_fullStr |
The Application of Keywords Extraction |
title_full_unstemmed |
The Application of Keywords Extraction |
title_sort |
application of keywords extraction |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/uw62va |
work_keys_str_mv |
AT hsuchengen theapplicationofkeywordsextraction AT xǔchéngēn theapplicationofkeywordsextraction AT hsuchengen guānjiàncízhēncèfāngfǎdebǐjiàoyǔyīngyòng AT xǔchéngēn guānjiàncízhēncèfāngfǎdebǐjiàoyǔyīngyòng AT hsuchengen applicationofkeywordsextraction AT xǔchéngēn applicationofkeywordsextraction |
_version_ |
1719298409868820480 |