Design of a Chinese Opinion Mining System

碩士 === 淡江大學 === 資訊工程學系碩士班 === 100 === Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexi...

Full description

Bibliographic Details
Main Authors: Lee Chien, 簡立
Other Authors: Rui-Dong Chiang
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/99856694618811977409
id ndltd-TW-100TKU05392077
record_format oai_dc
spelling ndltd-TW-100TKU053920772015-10-13T21:27:35Z http://ndltd.ncl.edu.tw/handle/99856694618811977409 Design of a Chinese Opinion Mining System 中文意見探勘系統設計 Lee Chien 簡立 碩士 淡江大學 資訊工程學系碩士班 100 Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexicon) way, this study uses the proposed exclusion word method to improve the opinion word capturing precision. As each of the different fields has different terminologies or idioms (opinion words and exclusion words), ordinary dictionaries can hardly cover all the opinion words in a specific field. However, for a specific field, as long as the training data are sufficient, most of the opinion words and exclusion words outside the dictionaries can be captured. The opinion words and exclusion words outside the dictionaries that have not been included in the training set are few, and at a stable state. Moreover, they are often opinion words and exclusion words that are not frequently used. This paper uses the experimental data of two different but similar fields of Mobile01 telecommunications. As this paper uses the thesaurus/lexicon way to capture the opinion words and exclusion words, all the opinion words and exclusion words in dictionaries can be captured. The opinion words and exclusion words outside the dictionaries can be determined only by manual tagging, which is time and labor consuming. Therefore, according to the stability of the new opinion words and exclusion words outside the dictionaries, this study attempts to design a two-stage lexicon training method to solve this problem. Regarding the proposed two-stage lexicon training method, the first stage is to capture the opinion words or exclusion words of training data by manual semi-automated tagging. The second stage is to directly use the dictionaries to capture the opinion words or exclusion words of the articles when the system is online before manually inspecting the accuracy of the captured opinion words and exclusion words. According to the experimental data, the training procedure of the second stage can save a great deal of time for manual tagging. Rui-Dong Chiang 蔣璿東 2012 學位論文 ; thesis 71 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 淡江大學 === 資訊工程學系碩士班 === 100 === Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexicon) way, this study uses the proposed exclusion word method to improve the opinion word capturing precision. As each of the different fields has different terminologies or idioms (opinion words and exclusion words), ordinary dictionaries can hardly cover all the opinion words in a specific field. However, for a specific field, as long as the training data are sufficient, most of the opinion words and exclusion words outside the dictionaries can be captured. The opinion words and exclusion words outside the dictionaries that have not been included in the training set are few, and at a stable state. Moreover, they are often opinion words and exclusion words that are not frequently used. This paper uses the experimental data of two different but similar fields of Mobile01 telecommunications. As this paper uses the thesaurus/lexicon way to capture the opinion words and exclusion words, all the opinion words and exclusion words in dictionaries can be captured. The opinion words and exclusion words outside the dictionaries can be determined only by manual tagging, which is time and labor consuming. Therefore, according to the stability of the new opinion words and exclusion words outside the dictionaries, this study attempts to design a two-stage lexicon training method to solve this problem. Regarding the proposed two-stage lexicon training method, the first stage is to capture the opinion words or exclusion words of training data by manual semi-automated tagging. The second stage is to directly use the dictionaries to capture the opinion words or exclusion words of the articles when the system is online before manually inspecting the accuracy of the captured opinion words and exclusion words. According to the experimental data, the training procedure of the second stage can save a great deal of time for manual tagging.
author2 Rui-Dong Chiang
author_facet Rui-Dong Chiang
Lee Chien
簡立
author Lee Chien
簡立
spellingShingle Lee Chien
簡立
Design of a Chinese Opinion Mining System
author_sort Lee Chien
title Design of a Chinese Opinion Mining System
title_short Design of a Chinese Opinion Mining System
title_full Design of a Chinese Opinion Mining System
title_fullStr Design of a Chinese Opinion Mining System
title_full_unstemmed Design of a Chinese Opinion Mining System
title_sort design of a chinese opinion mining system
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/99856694618811977409
work_keys_str_mv AT leechien designofachineseopinionminingsystem
AT jiǎnlì designofachineseopinionminingsystem
AT leechien zhōngwényìjiàntànkānxìtǒngshèjì
AT jiǎnlì zhōngwényìjiàntànkānxìtǒngshèjì
_version_ 1718064115010764800