Classification of Hate Tweets and Their Reasons using SVM

Denna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att...

Full description

Bibliographic Details
Main Author:	Tarasova, Natalya
Format:	Others
Language:	English
Published:	Uppsala universitet, Avdelningen för datalogi 2016
Subjects:	Support Vector Machines classification Akaike Information Criteria machine learning Twitter hate tweets
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-275782

id	ndltd-UPSALLA1-oai-DiVA.org-uu-275782
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-uu-2757822016-02-11T05:11:10ZClassification of Hate Tweets and Their Reasons using SVMengTarasova, NatalyaUppsala universitet, Avdelningen för datalogi2016Support Vector MachinesclassificationAkaike Information Criteriamachine learningTwitterhate tweetsDenna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att kunna identifiera ett hat-meddelande och dess orsak. Studien resulterade i två metoder: en "naiv" metod (the Naive Method, NM) och en mer "avancerad" metod (the Partial Timeline Method, PTM). NM är en binär metod i den bemärkelsen att den ställer frågan: "Tillhör denna tweet klassen Hat?". PTM ställer samma fråga men till en begränsad mängd av tweets, dvs bara de som ligger inom ± 30 min från publiceringen av hat-tweeten. Sammanfattningsvis indikerade studiens resultat att PTM är noggrannare än NM. Dock tar den inte hänsyn till samtliga tweets på användarens tidslinje. Därför medför valet av metod en avvägning: PTM erbjuder en noggrannare klassificering och NM erbjuder en mer utförlig klassificering. This study focused on finding the hate tweets posted by the customers of three mobileoperators Verizon, AT&T and Sprint and identifying the reasons for their dissatisfaction. The timelines with a hate tweet were collected and studied for the presence of an explanation. A machine learning approach was employed using four categories: Hate, Reason, Explanatory and Other. The classication was conducted with one-versus-all approach using Support Vector Machines algorithm implemented in a LIBSVM tool. The study resulted in two methodologies: the Naive method (NM) and the Partial Time-line Method (PTM). The Naive Method relied only on the feature space consisting of the most representative words chosen with Akaike Information Criterion. PTM utilized the fact that the majority of the explanations were posted within a one-hour time window of the posting of a hate tweet. We found that the accuracy of PTM is higher than for NM. In addition, PTM saves time and memory by analysing fewer tweets. At the same time this implies a trade-off between relevance and completeness. <p>Opponent: Kristina Wettainen</p>Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-275782UPTEC F, 1401-5757 ; 16001application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Support Vector Machines classification Akaike Information Criteria machine learning Twitter hate tweets
spellingShingle	Support Vector Machines classification Akaike Information Criteria machine learning Twitter hate tweets Tarasova, Natalya Classification of Hate Tweets and Their Reasons using SVM
description	Denna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att kunna identifiera ett hat-meddelande och dess orsak. Studien resulterade i två metoder: en "naiv" metod (the Naive Method, NM) och en mer "avancerad" metod (the Partial Timeline Method, PTM). NM är en binär metod i den bemärkelsen att den ställer frågan: "Tillhör denna tweet klassen Hat?". PTM ställer samma fråga men till en begränsad mängd av tweets, dvs bara de som ligger inom ± 30 min från publiceringen av hat-tweeten. Sammanfattningsvis indikerade studiens resultat att PTM är noggrannare än NM. Dock tar den inte hänsyn till samtliga tweets på användarens tidslinje. Därför medför valet av metod en avvägning: PTM erbjuder en noggrannare klassificering och NM erbjuder en mer utförlig klassificering. === This study focused on finding the hate tweets posted by the customers of three mobileoperators Verizon, AT&T and Sprint and identifying the reasons for their dissatisfaction. The timelines with a hate tweet were collected and studied for the presence of an explanation. A machine learning approach was employed using four categories: Hate, Reason, Explanatory and Other. The classication was conducted with one-versus-all approach using Support Vector Machines algorithm implemented in a LIBSVM tool. The study resulted in two methodologies: the Naive method (NM) and the Partial Time-line Method (PTM). The Naive Method relied only on the feature space consisting of the most representative words chosen with Akaike Information Criterion. PTM utilized the fact that the majority of the explanations were posted within a one-hour time window of the posting of a hate tweet. We found that the accuracy of PTM is higher than for NM. In addition, PTM saves time and memory by analysing fewer tweets. At the same time this implies a trade-off between relevance and completeness. === <p>Opponent: Kristina Wettainen</p>
author	Tarasova, Natalya
author_facet	Tarasova, Natalya
author_sort	Tarasova, Natalya
title	Classification of Hate Tweets and Their Reasons using SVM
title_short	Classification of Hate Tweets and Their Reasons using SVM
title_full	Classification of Hate Tweets and Their Reasons using SVM
title_fullStr	Classification of Hate Tweets and Their Reasons using SVM
title_full_unstemmed	Classification of Hate Tweets and Their Reasons using SVM
title_sort	classification of hate tweets and their reasons using svm
publisher	Uppsala universitet, Avdelningen för datalogi
publishDate	2016
url	http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-275782
work_keys_str_mv	AT tarasovanatalya classificationofhatetweetsandtheirreasonsusingsvm
_version_	1718187115655725056

Classification of Hate Tweets and Their Reasons using SVM

Similar Items