Sentiment analysis for patient-authored journal text

碩士 === 中原大學 === 資訊工程研究所 === 103 === This research use Sentiment Analysis (SA) technology to the medical community, where the text is patient-authored text, PAT. Analysis of patients in the medical community activities, we can get valuable information on the emotional condition. Sentiment analysis of...

Full description

Bibliographic Details
Main Authors: Ting-Hsun Chen, 陳庭勛
Other Authors: Shih-Wen Ke
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/mdgd52
Description
Summary:碩士 === 中原大學 === 資訊工程研究所 === 103 === This research use Sentiment Analysis (SA) technology to the medical community, where the text is patient-authored text, PAT. Analysis of patients in the medical community activities, we can get valuable information on the emotional condition. Sentiment analysis of the current research, mostly to social networking sites, movie reviews, product comments ... etc. In previous studies, some technologies can be applied to the domain of medical community, but still some different situations, such as referred to of drugs, symptoms or disease. Our data source selection famous foreign medical community website "www.patientslikeme.com".This study explores the past of Standards and Technology SA is available for patient-authored text, and the classification results with two labels (positive, negative) and three labels (positive, neutral, negative) rendering. In this study, the two methods sentiment: natural language model (pattern-based) and machine learning (support vector machine, SVM), Pattern-based approach uses Adv Verb Combine &; Adv Adj Combine (AVC &; AAC) and Adv Verb Adj Combine - SentiWordNet (AVAC-SWN) to classification texts by the rules. Machine learning algorithms: SVM has been frequently used in the SA. In this study, we proposed Semantic Weighting method to modify the text of semantic weightage, using the Pattern-based generating sentiment score to change weightage. Finally we compare with the baseline of SVM which is unigram as features and frequency as weightage. The results show that the results of two labels classification or three labels, AVC &; AAC is better than the AVAC-SWN. The reason is that, AVAC-SWN is more suitable for product reviews and not for PAT text. In SVM, we will be divided into three types of data, 1) all text, 2) medical-related text and 3) express simple sentiment text. The results showed that applying Semantic Weighting to classify all text and express simple sentiment text is better than baseline performance. The medical-related text overall result is ineffective, neutral and negative sentiment are raised but positive classification effect is reduced. This type of data comprising 50% negative label sentiment text, this situation illustrates the text discuss medical-related is more complex or obscure. In the future we will keep working on influence between medical-related text and sentiment behaves.