Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification

There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce th...

Full description

Bibliographic Details
Main Authors: Eko Laksono, Achmad Basuki, Fitra Bachtiar
Format: Article
Language:Indonesian
Published: Ikatan Ahli Indormatika Indonesia 2020-04-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:http://jurnal.iaii.or.id/index.php/RESTI/article/view/1845
id doaj-c970c8edb32a42339fdfadbcb85cdda0
record_format Article
spelling doaj-c970c8edb32a42339fdfadbcb85cdda02020-11-25T02:01:34ZindIkatan Ahli Indormatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602020-04-014237738310.29207/resti.v4i2.18451845Optimization of K Value in KNN Algorithm for Spam and Ham Email ClassificationEko Laksono0Achmad Basuki1Fitra Bachtiar2Brawijaya UniversityBrawijaya UniversityBrawijaya UniversityThere are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.http://jurnal.iaii.or.id/index.php/RESTI/article/view/1845classification, email spam, knn, frequency distribution clustering, k-means clustering
collection DOAJ
language Indonesian
format Article
sources DOAJ
author Eko Laksono
Achmad Basuki
Fitra Bachtiar
spellingShingle Eko Laksono
Achmad Basuki
Fitra Bachtiar
Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
classification, email spam, knn, frequency distribution clustering, k-means clustering
author_facet Eko Laksono
Achmad Basuki
Fitra Bachtiar
author_sort Eko Laksono
title Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
title_short Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
title_full Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
title_fullStr Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
title_full_unstemmed Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
title_sort optimization of k value in knn algorithm for spam and ham email classification
publisher Ikatan Ahli Indormatika Indonesia
series Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
issn 2580-0760
publishDate 2020-04-01
description There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.
topic classification, email spam, knn, frequency distribution clustering, k-means clustering
url http://jurnal.iaii.or.id/index.php/RESTI/article/view/1845
work_keys_str_mv AT ekolaksono optimizationofkvalueinknnalgorithmforspamandhamemailclassification
AT achmadbasuki optimizationofkvalueinknnalgorithmforspamandhamemailclassification
AT fitrabachtiar optimizationofkvalueinknnalgorithmforspamandhamemailclassification
_version_ 1724957041072537600