Combining the Supervised and Unsupervised Approaches to Identifying Opinion Holders in News

碩士 === 國立臺灣師範大學 === 資訊工程學系 === 104 === Opinion mining helps us automatically extract useful subjective information from a large number of reliable texts. Opinion sentences can be decomposed into four parts, including opinion topic, opinion holder, opinion claim and opinion sentiment. Our goal aims t...

Full description

Bibliographic Details
Main Authors: Chang, Yi-Hao, 張益豪
Other Authors: Hou, Wen-Juan
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/50132841929301847980
Description
Summary:碩士 === 國立臺灣師範大學 === 資訊工程學系 === 104 === Opinion mining helps us automatically extract useful subjective information from a large number of reliable texts. Opinion sentences can be decomposed into four parts, including opinion topic, opinion holder, opinion claim and opinion sentiment. Our goal aims to identify the holders of opinion. This study proposes a combination of supervised and unsupervised learning approaches to extract the article author and holders. The main flow of our research work is divided into two phases: identifying article author and holders of the opinion sentence among the labeled corpus. The purpose of opinion holder identification is to capture the expression of the person or organizations from the subjectivity opinion sentences. The approach is based on the supervised learning method using several manual annotated corpus provided in the online news articles. The preprocessing steps via natural language processing techniques, such as segmentation, part-of-speech tagging and named entity recognition, etc. Our feature analysis is based on both machine learning (i.e., support vector machine, SVM) and unsupervised pattern recognition techniques. Different SVM models are evaluated via cross-validation experiments. The proposed features consist of the lexical feature, part-of-speech feature, punctuation mark feature, named entity feature, syntactic feature, position feature, phrase composition feature and opinion word feature. The study also addresses the problem of multiple opinion holder candidates being realized in a single sentence. The proposed approach includes some unsupervised extracting methods to detect the opinion holders without labeled training data. Some manual rules are employed to revise the incomplete holder representations. Furthermore, the Hobbs algorithm is applied to resolve the anaphora resolution problem. Our approach is tested on an annotated news corpus with 10-fold cross- validation and with feature deletion analysis, obtaining 91.58% and 71.83% of F-1 scores for the task of extracting author’s opinion and the task of opinion holder identification, respectively. Finally, the experimental results show the exhilaratingly good performance.