The Application of Keywords Extraction

碩士 === 國立政治大學 === 統計學系 === 107 === Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help...

Full description

Bibliographic Details
Main Authors: Hsu, Cheng-En, 許承恩
Other Authors: Yue, Ching-Syang
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/uw62va
id ndltd-TW-107NCCU5337019
record_format oai_dc
spelling ndltd-TW-107NCCU53370192019-11-28T05:23:26Z http://ndltd.ncl.edu.tw/handle/uw62va The Application of Keywords Extraction 關鍵詞偵測方法的比較與應用 Hsu, Cheng-En 許承恩 碩士 國立政治大學 統計學系 107 Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help of human experts. This greatly increases the efficiency of reading a hugh amount of texts provided that the texts are properly structurized. The structurization of texts includes quite a few steps, such as keyword extraction and sentiment analysis. The keyword extraction is critical and the keywords can be used to summarize an article and compare two authors’ writing styles. The goal of this study is to propose a new unsupervised method for extracting keywords and compare it to some frequently used methods, including term frequency inverse document frequency (TF-IDF), logistic regression, machine learning models. In the empirical analysis, we considered three modern Chinese texts, one from People’s Daily (514 articles in 1971-1989) and two from New Youth Magazine (volumes 7 and 8 in 1919-1920). The numbers of words in all texts are approximately 400,000 to 600,000. We asked historical scholars to pick up keywords from these three texts and treat them as the true keywords. Then, we applied different keyword extraction methods to these texts and compared their results. We found that the proposed method has the best performance among all supervised methods and it is competitive to the supervised methods. Yue, Ching-Syang Cheng, Wen-Huei 余清祥 鄭文惠 2019 學位論文 ; thesis 57 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立政治大學 === 統計學系 === 107 === Text Mining has become one of the popular research areas after the IBM proposed the term Big Data in 2010. Since then many texts are being digitalized and more scholars are devoted in developing quantitative tools for giving texts semantic meaning without the help of human experts. This greatly increases the efficiency of reading a hugh amount of texts provided that the texts are properly structurized. The structurization of texts includes quite a few steps, such as keyword extraction and sentiment analysis. The keyword extraction is critical and the keywords can be used to summarize an article and compare two authors’ writing styles. The goal of this study is to propose a new unsupervised method for extracting keywords and compare it to some frequently used methods, including term frequency inverse document frequency (TF-IDF), logistic regression, machine learning models. In the empirical analysis, we considered three modern Chinese texts, one from People’s Daily (514 articles in 1971-1989) and two from New Youth Magazine (volumes 7 and 8 in 1919-1920). The numbers of words in all texts are approximately 400,000 to 600,000. We asked historical scholars to pick up keywords from these three texts and treat them as the true keywords. Then, we applied different keyword extraction methods to these texts and compared their results. We found that the proposed method has the best performance among all supervised methods and it is competitive to the supervised methods.
author2 Yue, Ching-Syang
author_facet Yue, Ching-Syang
Hsu, Cheng-En
許承恩
author Hsu, Cheng-En
許承恩
spellingShingle Hsu, Cheng-En
許承恩
The Application of Keywords Extraction
author_sort Hsu, Cheng-En
title The Application of Keywords Extraction
title_short The Application of Keywords Extraction
title_full The Application of Keywords Extraction
title_fullStr The Application of Keywords Extraction
title_full_unstemmed The Application of Keywords Extraction
title_sort application of keywords extraction
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/uw62va
work_keys_str_mv AT hsuchengen theapplicationofkeywordsextraction
AT xǔchéngēn theapplicationofkeywordsextraction
AT hsuchengen guānjiàncízhēncèfāngfǎdebǐjiàoyǔyīngyòng
AT xǔchéngēn guānjiàncízhēncèfāngfǎdebǐjiàoyǔyīngyòng
AT hsuchengen applicationofkeywordsextraction
AT xǔchéngēn applicationofkeywordsextraction
_version_ 1719298409868820480