A Comparative Study of Feature Extraction Methods for Sarcasm Detection

碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sa...

Full description

Bibliographic Details
Main Author: rahmat fadli isnanto
Other Authors: 王正豪
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/we52k5
id ndltd-TW-106TIT0570A009
record_format oai_dc
spelling ndltd-TW-106TIT0570A0092019-10-03T03:40:48Z http://ndltd.ncl.edu.tw/handle/we52k5 A Comparative Study of Feature Extraction Methods for Sarcasm Detection A Comparative Study of Feature Extraction Methods for Sarcasm Detection rahmat fadli isnanto rahmat fadli isnanto 碩士 國立臺北科技大學 電資國際專班 106 The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection. The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%. 王正豪 2018 學位論文 ; thesis 58 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection. The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%.
author2 王正豪
author_facet 王正豪
rahmat fadli isnanto
rahmat fadli isnanto
author rahmat fadli isnanto
rahmat fadli isnanto
spellingShingle rahmat fadli isnanto
rahmat fadli isnanto
A Comparative Study of Feature Extraction Methods for Sarcasm Detection
author_sort rahmat fadli isnanto
title A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_short A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_full A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_fullStr A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_full_unstemmed A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_sort comparative study of feature extraction methods for sarcasm detection
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/we52k5
work_keys_str_mv AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection
AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection
AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection
AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection
_version_ 1719259311796912128