Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation

A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods i...

Full description

Bibliographic Details
Main Authors: Warsitha, Tedy, Kammerlander, Robin
Format: Others
Language:English
Published: KTH, Skolan för datavetenskap och kommunikation (CSC) 2016
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132
id ndltd-UPSALLA1-oai-DiVA.org-kth-188132
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-1881322016-06-10T05:12:48ZAnalyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier EvaluationengPrestandaanalys av två metoder inom semisupervised och supervised maskininlärningWarsitha, TedyKammerlander, RobinKTH, Skolan för datavetenskap och kommunikation (CSC)KTH, Skolan för datavetenskap och kommunikation (CSC)2016A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power.  En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft.  Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
description A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power.  === En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft. 
author Warsitha, Tedy
Kammerlander, Robin
spellingShingle Warsitha, Tedy
Kammerlander, Robin
Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
author_facet Warsitha, Tedy
Kammerlander, Robin
author_sort Warsitha, Tedy
title Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
title_short Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
title_full Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
title_fullStr Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
title_full_unstemmed Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation
title_sort analyzing the ability of naive-bayes and label spreading to predict labels with varying quantities of training data : classifier evaluation
publisher KTH, Skolan för datavetenskap och kommunikation (CSC)
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132
work_keys_str_mv AT warsithatedy analyzingtheabilityofnaivebayesandlabelspreadingtopredictlabelswithvaryingquantitiesoftrainingdataclassifierevaluation
AT kammerlanderrobin analyzingtheabilityofnaivebayesandlabelspreadingtopredictlabelswithvaryingquantitiesoftrainingdataclassifierevaluation
AT warsithatedy prestandaanalysavtvametoderinomsemisupervisedochsupervisedmaskininlarning
AT kammerlanderrobin prestandaanalysavtvametoderinomsemisupervisedochsupervisedmaskininlarning
_version_ 1718301610071818240