Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 93 === Opinions may be explicitly or implicitly embedded in documents. They are useful information and viewpoints to improve services of government or products of companies. We consider that an opinion is a statement expressed towards a topic and contains sentiments....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/64039269154360333652 |
id |
ndltd-TW-093NTU05392080 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093NTU053920802015-12-21T04:04:14Z http://ndltd.ncl.edu.tw/handle/64039269154360333652 Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction 中文情緒詞彙自動學習及在意見擷取之應用 Tung-Ho Wu 吳東和 碩士 國立臺灣大學 資訊工程學研究所 93 Opinions may be explicitly or implicitly embedded in documents. They are useful information and viewpoints to improve services of government or products of companies. We consider that an opinion is a statement expressed towards a topic and contains sentiments. Sentiment words determine the opinion type of an opinion passage and the overall opinion tendency of a document. Sentiment words are the key features in opinion extraction. We propose three approaches, including the Thesaurus-Based Approach, the Character-Based Approach and the Combined Approach, to determine whether an unknown word is positive, negative or non-sentiment. The Thesaurus-Based Approach utilizes the synonym information to classify an unknown word. The Character-Based Approach computes the sentiment score of a Chinese word based on its composite characters and classifies a word by its sentiment score information. The Combined Approach utilizes the synonym information and sentiment scores to classify an unknown word. This approach is the best among these approaches. The F-measure is 73.18% and 63.75% for verbs and nouns, respectively under strict assessment by human. The average F-measure is 70.40%. Finally, we propose the Sentiment Miner based on the Combined Approach to acquire new positive and negative sentiment words from documents. For opinion extraction, we propose the Passage Level Algorithm to detect the opinion passages inside a document. This algorithm utilizes sentiment words and context information. We also propose the Document Level Algorithm to determine the overall opinion tendency of a document based on the opinion passages inside the document. In experiments, the best F-measure is 62.16% at the passage level and 76.56% at the document level. 陳信希 2005 學位論文 ; thesis 48 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 93 === Opinions may be explicitly or implicitly embedded in documents. They are useful information and viewpoints to improve services of government or products of companies. We consider that an opinion is a statement expressed towards a topic and contains sentiments. Sentiment words determine the opinion type of an opinion passage and the overall opinion tendency of a document. Sentiment words are the key features in opinion extraction.
We propose three approaches, including the Thesaurus-Based Approach, the Character-Based Approach and the Combined Approach, to determine whether an unknown word is positive, negative or non-sentiment. The Thesaurus-Based Approach utilizes the synonym information to classify an unknown word. The Character-Based Approach computes the sentiment score of a Chinese word based on its composite characters and classifies a word by its sentiment score information. The Combined Approach utilizes the synonym information and sentiment scores to classify an unknown word. This approach is the best among these approaches. The F-measure is 73.18% and 63.75% for verbs and nouns, respectively under strict assessment by human. The average F-measure is 70.40%. Finally, we propose the Sentiment Miner based on the Combined Approach to acquire new positive and negative sentiment words from documents.
For opinion extraction, we propose the Passage Level Algorithm to detect the opinion passages inside a document. This algorithm utilizes sentiment words and context information. We also propose the Document Level Algorithm to determine the overall opinion tendency of a document based on the opinion passages inside the document. In experiments, the best F-measure is 62.16% at the passage level and 76.56% at the document level.
|
author2 |
陳信希 |
author_facet |
陳信希 Tung-Ho Wu 吳東和 |
author |
Tung-Ho Wu 吳東和 |
spellingShingle |
Tung-Ho Wu 吳東和 Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
author_sort |
Tung-Ho Wu |
title |
Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
title_short |
Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
title_full |
Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
title_fullStr |
Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
title_full_unstemmed |
Chinese Sentiment Word Acquisition and Its Applications to Opinion Extraction |
title_sort |
chinese sentiment word acquisition and its applications to opinion extraction |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/64039269154360333652 |
work_keys_str_mv |
AT tunghowu chinesesentimentwordacquisitionanditsapplicationstoopinionextraction AT wúdōnghé chinesesentimentwordacquisitionanditsapplicationstoopinionextraction AT tunghowu zhōngwénqíngxùcíhuìzìdòngxuéxíjízàiyìjiànxiéqǔzhīyīngyòng AT wúdōnghé zhōngwénqíngxùcíhuìzìdòngxuéxíjízàiyìjiànxiéqǔzhīyīngyòng |
_version_ |
1718154560906723328 |