A Comparative Study of Feature Extraction Methods for Sarcasm Detection
碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sa...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/we52k5 |
id |
ndltd-TW-106TIT0570A009 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106TIT0570A0092019-10-03T03:40:48Z http://ndltd.ncl.edu.tw/handle/we52k5 A Comparative Study of Feature Extraction Methods for Sarcasm Detection A Comparative Study of Feature Extraction Methods for Sarcasm Detection rahmat fadli isnanto rahmat fadli isnanto 碩士 國立臺北科技大學 電資國際專班 106 The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection. The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%. 王正豪 2018 學位論文 ; thesis 58 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection.
The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%.
|
author2 |
王正豪 |
author_facet |
王正豪 rahmat fadli isnanto rahmat fadli isnanto |
author |
rahmat fadli isnanto rahmat fadli isnanto |
spellingShingle |
rahmat fadli isnanto rahmat fadli isnanto A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
author_sort |
rahmat fadli isnanto |
title |
A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
title_short |
A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
title_full |
A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
title_fullStr |
A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
title_full_unstemmed |
A Comparative Study of Feature Extraction Methods for Sarcasm Detection |
title_sort |
comparative study of feature extraction methods for sarcasm detection |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/we52k5 |
work_keys_str_mv |
AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection |
_version_ |
1719259311796912128 |