Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik

In this thesis we compare how different strategies in choosing attribute values affects junk mail filtering. We used two different variants of a naïve Bayesian junk mail filter. The first variant classified an e-mail by comparing it to a feature vector containing all attribute values that were found...

Full description

Bibliographic Details
Main Authors:	Bünger, Sara, Nilsson, Stefan
Format:	Others
Language:	Swedish
Published:	Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan 2007
Subjects:	automatisk klassifikation bayesianskt filter skräppost filtrering Social Sciences Samhällsvetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18675

id	ndltd-UPSALLA1-oai-DiVA.org-hb-18675
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-hb-186752019-05-01T05:15:54ZFiltrering av e-post : Binär klassifikation med naiv Bayesiansk tekniksweFiltering e-mail : Binary classification with naïve Bayesian techniqueBünger, SaraNilsson, StefanHögskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / BibliotekshögskolanHögskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / BibliotekshögskolanUniversity of Borås/Swedish School of Library and Information Science (SSLIS)2007automatisk klassifikationbayesianskt filterskräppostfiltreringSocial SciencesSamhällsvetenskapIn this thesis we compare how different strategies in choosing attribute values affects junk mail filtering. We used two different variants of a naïve Bayesian junk mail filter. The first variant classified an e-mail by comparing it to a feature vector containing all attribute values that were found in junk mails in the part of the e-mail collection we used for training the filter. The second variant compared an e-mail to a feature vector that consisted of the attributes that was found in ten or more junk mails in the part of the e-mail collection we used for training the filter. We used an e-mail collection that consisted of 300 e-mails, 210 of these were junk mails and 90 were legitimate e-mails. We measured the results in our study using; SP, SR and F1 and to be able to compare the two different strategies we cross validated them. The results we got in our study showed that the first strategy got higher average F1 values than our second strategy. Despite of this we believe that the second strategy is the better one. Instead of comparing the e-mail to a feature vector containing all attribute values found in junk mails, the results will be better if the filter compares the e-mail to a feature vector that contains a limited amount of attribute values. Uppsatsnivå: DStudent thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18675Local 2320/2902Magisteruppsats i biblioteks- och informationsvetenskap vid institutionen Biblioteks- och informationsvetenskap, 1654-0247 ; 2007:132application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	Swedish
format	Others
sources	NDLTD
topic	automatisk klassifikation bayesianskt filter skräppost filtrering Social Sciences Samhällsvetenskap
spellingShingle	automatisk klassifikation bayesianskt filter skräppost filtrering Social Sciences Samhällsvetenskap Bünger, Sara Nilsson, Stefan Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
description	In this thesis we compare how different strategies in choosing attribute values affects junk mail filtering. We used two different variants of a naïve Bayesian junk mail filter. The first variant classified an e-mail by comparing it to a feature vector containing all attribute values that were found in junk mails in the part of the e-mail collection we used for training the filter. The second variant compared an e-mail to a feature vector that consisted of the attributes that was found in ten or more junk mails in the part of the e-mail collection we used for training the filter. We used an e-mail collection that consisted of 300 e-mails, 210 of these were junk mails and 90 were legitimate e-mails. We measured the results in our study using; SP, SR and F1 and to be able to compare the two different strategies we cross validated them. The results we got in our study showed that the first strategy got higher average F1 values than our second strategy. Despite of this we believe that the second strategy is the better one. Instead of comparing the e-mail to a feature vector containing all attribute values found in junk mails, the results will be better if the filter compares the e-mail to a feature vector that contains a limited amount of attribute values. === Uppsatsnivå: D
author	Bünger, Sara Nilsson, Stefan
author_facet	Bünger, Sara Nilsson, Stefan
author_sort	Bünger, Sara
title	Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
title_short	Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
title_full	Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
title_fullStr	Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
title_full_unstemmed	Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik
title_sort	filtrering av e-post : binär klassifikation med naiv bayesiansk teknik
publisher	Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan
publishDate	2007
url	http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18675
work_keys_str_mv	AT bungersara filtreringavepostbinarklassifikationmednaivbayesianskteknik AT nilssonstefan filtreringavepostbinarklassifikationmednaivbayesianskteknik AT bungersara filteringemailbinaryclassificationwithnaivebayesiantechnique AT nilssonstefan filteringemailbinaryclassificationwithnaivebayesiantechnique
_version_	1719022549955772416

Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik

Similar Items