A Comparative Study of Feature Extraction Methods for Sarcasm Detection

碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sa...

Full description

Bibliographic Details
Main Author:	rahmat fadli isnanto
Other Authors:	王正豪
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/we52k5

id	ndltd-TW-106TIT0570A009
record_format	oai_dc
spelling	ndltd-TW-106TIT0570A0092019-10-03T03:40:48Z http://ndltd.ncl.edu.tw/handle/we52k5 A Comparative Study of Feature Extraction Methods for Sarcasm Detection A Comparative Study of Feature Extraction Methods for Sarcasm Detection rahmat fadli isnanto rahmat fadli isnanto 碩士國立臺北科技大學電資國際專班 106 The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection. The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%. 王正豪 2018 學位論文 ; thesis 58 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺北科技大學 === 電資國際專班 === 106 === The increasing popularity of sentiment analysis has led many companies to use it as an after-sales service to improve the quality of their products. Among all the tasks in sentiment analysis, it’s difficult to detect sarcasm since there’s a lack of features. Sarcasm is an expression that has a vague meaning. It usually has the opposite meaning of what is written. Most of the previous work focused on measuring negative and positive sentiments in sentences. However, more features are still needed for improving the result. Thus, we intend to compare different feature extraction methods like n-gram feature, sentiment, punctuation, part of speech, and also topic modeling for sarcasm detection. The study consists of five main stages: data collection, data processing, feature extraction, classification, and evaluation. In the first stage, we collect tweets with the hashtag “sarcasm” as sarcastic data using twitter API. Then, we manually divided the data into sarcastic and regular data. Pre-processing part consists of deleting all the hashtags, references to another user (@ symbol) and URL address. The third is the extraction of five features: sentiment feature, n-gram feature, punctuation feature, part of speech feature, and topic model feature. All features were combined by a technique called One Hot Encoding to be processed for next step. In the classification stage, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. For evaluation, we train each individual feature and found that n-gram feature gives the highest performance compared to the other features. We also found that Support Vector Machine gives the best performance compared to logistic regression with the accuracy of 82,65%.
author2	王正豪
author_facet	王正豪 rahmat fadli isnanto rahmat fadli isnanto
author	rahmat fadli isnanto rahmat fadli isnanto
spellingShingle	rahmat fadli isnanto rahmat fadli isnanto A Comparative Study of Feature Extraction Methods for Sarcasm Detection
author_sort	rahmat fadli isnanto
title	A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_short	A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_full	A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_fullStr	A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_full_unstemmed	A Comparative Study of Feature Extraction Methods for Sarcasm Detection
title_sort	comparative study of feature extraction methods for sarcasm detection
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/we52k5
work_keys_str_mv	AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto acomparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection AT rahmatfadliisnanto comparativestudyoffeatureextractionmethodsforsarcasmdetection
_version_	1719259311796912128

A Comparative Study of Feature Extraction Methods for Sarcasm Detection

Similar Items