Design of a Chinese Opinion Mining System

碩士 === 淡江大學 === 資訊工程學系碩士班 === 100 === Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexi...

Full description

Bibliographic Details
Main Authors: Lee Chien, 簡立
Other Authors: Rui-Dong Chiang
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/99856694618811977409
Description
Summary:碩士 === 淡江大學 === 資訊工程學系碩士班 === 100 === Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexicon) way, this study uses the proposed exclusion word method to improve the opinion word capturing precision. As each of the different fields has different terminologies or idioms (opinion words and exclusion words), ordinary dictionaries can hardly cover all the opinion words in a specific field. However, for a specific field, as long as the training data are sufficient, most of the opinion words and exclusion words outside the dictionaries can be captured. The opinion words and exclusion words outside the dictionaries that have not been included in the training set are few, and at a stable state. Moreover, they are often opinion words and exclusion words that are not frequently used. This paper uses the experimental data of two different but similar fields of Mobile01 telecommunications. As this paper uses the thesaurus/lexicon way to capture the opinion words and exclusion words, all the opinion words and exclusion words in dictionaries can be captured. The opinion words and exclusion words outside the dictionaries can be determined only by manual tagging, which is time and labor consuming. Therefore, according to the stability of the new opinion words and exclusion words outside the dictionaries, this study attempts to design a two-stage lexicon training method to solve this problem. Regarding the proposed two-stage lexicon training method, the first stage is to capture the opinion words or exclusion words of training data by manual semi-automated tagging. The second stage is to directly use the dictionaries to capture the opinion words or exclusion words of the articles when the system is online before manually inspecting the accuracy of the captured opinion words and exclusion words. According to the experimental data, the training procedure of the second stage can save a great deal of time for manual tagging.