Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods i...
Main Authors: | , |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Skolan för datavetenskap och kommunikation (CSC)
2016
|
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132 |
id |
ndltd-UPSALLA1-oai-DiVA.org-kth-188132 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-kth-1881322016-06-10T05:12:48ZAnalyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier EvaluationengPrestandaanalys av två metoder inom semisupervised och supervised maskininlärningWarsitha, TedyKammerlander, RobinKTH, Skolan för datavetenskap och kommunikation (CSC)KTH, Skolan för datavetenskap och kommunikation (CSC)2016A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power. En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
description |
A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power. === En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft. |
author |
Warsitha, Tedy Kammerlander, Robin |
spellingShingle |
Warsitha, Tedy Kammerlander, Robin Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
author_facet |
Warsitha, Tedy Kammerlander, Robin |
author_sort |
Warsitha, Tedy |
title |
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
title_short |
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
title_full |
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
title_fullStr |
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
title_full_unstemmed |
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation |
title_sort |
analyzing the ability of naive-bayes and label spreading to predict labels with varying quantities of training data : classifier evaluation |
publisher |
KTH, Skolan för datavetenskap och kommunikation (CSC) |
publishDate |
2016 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132 |
work_keys_str_mv |
AT warsithatedy analyzingtheabilityofnaivebayesandlabelspreadingtopredictlabelswithvaryingquantitiesoftrainingdataclassifierevaluation AT kammerlanderrobin analyzingtheabilityofnaivebayesandlabelspreadingtopredictlabelswithvaryingquantitiesoftrainingdataclassifierevaluation AT warsithatedy prestandaanalysavtvametoderinomsemisupervisedochsupervisedmaskininlarning AT kammerlanderrobin prestandaanalysavtvametoderinomsemisupervisedochsupervisedmaskininlarning |
_version_ |
1718301610071818240 |