The Study of Enhancing Spam Filtering Mechanism

碩士 === 高苑科技大學 === 資訊科技應用研究所 === 100 === In this thesis we use multi-level attempts to prevent the mechanism with the combination of data mining techniques to evaluate the spam filtering effect. For evaluation need we collect 5333 emails totally including the junk and regular emails. All emails are f...

Full description

Bibliographic Details
Main Authors: Chih-Cheng Su, 蘇智成
Other Authors: Chin-Yuan Hsieh
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/80153163199199294765
id ndltd-TW-100KYIT0396014
record_format oai_dc
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 高苑科技大學 === 資訊科技應用研究所 === 100 === In this thesis we use multi-level attempts to prevent the mechanism with the combination of data mining techniques to evaluate the spam filtering effect. For evaluation need we collect 5333 emails totally including the junk and regular emails. All emails are filtered by three steps. The filtering effect including the accuracy and efficiency are show below. The first research method is to use blacklist DNS reverse lookup, the UCE level setting e-mail filtering mechanism to assess the spam filtering accuracy and efficiency. The second evaluation method is to apply the data mining techniques with the message text transcoding, word analysis, and stop word processing technology, to calculate the frequency of occurrence of words. Then the PART analysis techniques is applied to evaluate the accuracy and efficiency of data mining filtering. The third research method is to use the ata mining techniques with the message transcoding the quasi-word analysis, and stop word processing, with hyperlinks to the correctness of the filtering operations contained within the judgment to evaluate the spam filter accuracy rate and efficiency. The research results show that in the first spam filtering method the accuracy rate of the spam filter is about 65.45%, the average filtering time is about 13.8 microseconds (μs). We have the higher accuracy rate of about 65.66% on the DNS reverse lookup step is applied first, and then the blacklist with the UCE level techniques is used for spam filtering. Reversely the accuracy rate of about 65.23% on the blacklist step is applied first, and then the DNS reverse lookup step with the UCE level techniques is used for spam filtering. The difference of accuracy rate is 0.43%. In this research for filtering spam email efficiently we recommend to apply the DNS reverse lookup method first, and then apply the blacklist with the UCE level techniques to obtain the higher filtering accuracy and efficiency. In second research method four attributes in the data mining techniques are applied. There are 128、256、512 and 1028 in spam filtering. The analysis result shows the filtering correct rate of about 92.7% on the attribute of 128, about 92.96% on the attribution of 256, about 92.92% on the attribution of 512, and about 92.95% on the attribution of 1024. The results show that the attribute of 256 is better for the spam filter in data mining technique. The correct rate of spam filtering is about 92.3% on the filtering the email sender and subject. The average filtering time of each email is about 104 microseconds (μs), and the accurate filter rate is about 0.888%/µs. For the filtering technique of email sender, email subject and message, the correct rate of spam filtering is about 93.45%. The average filtering time of each email is about 1134 microseconds (μs). The accurate filter rate is about 0.827%/µs, which is less than that on spam filtering email sender and subject. In the third research method the email sender, the message subject and the message, with the hyperlinks correctness are used to evaluate the spam filter accuracy and efficiency. The correct rate of spam filtering is about 99.35%. The average filtering time of each email is about 94 microseconds (μs), and the accurate filter rate is about 1.057%/µs. It shows the most effective in spam filtering and one of the best spam filter techniques. In conclusion the best filtering technique is applying the DNS reverse lookup technique at first, then use the blacklist, with the UCE level setup technique. Finally the spam filtering technique of data mining becomes the best filter method by further filtering the email sender, subject, message, 256 attributes and judging the hyperlink correctness. In this research we found some relationship exists in different attributes and message content by using the data mining for spam filtering. The future study is to find the model prediction for the relationship between them. Further the hyperlink technique can be modified to enhance the spam direction for increasing the filtering effect.
author2 Chin-Yuan Hsieh
author_facet Chin-Yuan Hsieh
Chih-Cheng Su
蘇智成
author Chih-Cheng Su
蘇智成
spellingShingle Chih-Cheng Su
蘇智成
The Study of Enhancing Spam Filtering Mechanism
author_sort Chih-Cheng Su
title The Study of Enhancing Spam Filtering Mechanism
title_short The Study of Enhancing Spam Filtering Mechanism
title_full The Study of Enhancing Spam Filtering Mechanism
title_fullStr The Study of Enhancing Spam Filtering Mechanism
title_full_unstemmed The Study of Enhancing Spam Filtering Mechanism
title_sort study of enhancing spam filtering mechanism
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/80153163199199294765
work_keys_str_mv AT chihchengsu thestudyofenhancingspamfilteringmechanism
AT sūzhìchéng thestudyofenhancingspamfilteringmechanism
AT chihchengsu qiánghuàguòlǜlājīyóujiànjīzhìzhīyánjiū
AT sūzhìchéng qiánghuàguòlǜlājīyóujiànjīzhìzhīyánjiū
AT chihchengsu studyofenhancingspamfilteringmechanism
AT sūzhìchéng studyofenhancingspamfilteringmechanism
_version_ 1718054698653581312
spelling ndltd-TW-100KYIT03960142015-10-13T21:02:41Z http://ndltd.ncl.edu.tw/handle/80153163199199294765 The Study of Enhancing Spam Filtering Mechanism 強化過濾垃圾郵件機制之研究 Chih-Cheng Su 蘇智成 碩士 高苑科技大學 資訊科技應用研究所 100 In this thesis we use multi-level attempts to prevent the mechanism with the combination of data mining techniques to evaluate the spam filtering effect. For evaluation need we collect 5333 emails totally including the junk and regular emails. All emails are filtered by three steps. The filtering effect including the accuracy and efficiency are show below. The first research method is to use blacklist DNS reverse lookup, the UCE level setting e-mail filtering mechanism to assess the spam filtering accuracy and efficiency. The second evaluation method is to apply the data mining techniques with the message text transcoding, word analysis, and stop word processing technology, to calculate the frequency of occurrence of words. Then the PART analysis techniques is applied to evaluate the accuracy and efficiency of data mining filtering. The third research method is to use the ata mining techniques with the message transcoding the quasi-word analysis, and stop word processing, with hyperlinks to the correctness of the filtering operations contained within the judgment to evaluate the spam filter accuracy rate and efficiency. The research results show that in the first spam filtering method the accuracy rate of the spam filter is about 65.45%, the average filtering time is about 13.8 microseconds (μs). We have the higher accuracy rate of about 65.66% on the DNS reverse lookup step is applied first, and then the blacklist with the UCE level techniques is used for spam filtering. Reversely the accuracy rate of about 65.23% on the blacklist step is applied first, and then the DNS reverse lookup step with the UCE level techniques is used for spam filtering. The difference of accuracy rate is 0.43%. In this research for filtering spam email efficiently we recommend to apply the DNS reverse lookup method first, and then apply the blacklist with the UCE level techniques to obtain the higher filtering accuracy and efficiency. In second research method four attributes in the data mining techniques are applied. There are 128、256、512 and 1028 in spam filtering. The analysis result shows the filtering correct rate of about 92.7% on the attribute of 128, about 92.96% on the attribution of 256, about 92.92% on the attribution of 512, and about 92.95% on the attribution of 1024. The results show that the attribute of 256 is better for the spam filter in data mining technique. The correct rate of spam filtering is about 92.3% on the filtering the email sender and subject. The average filtering time of each email is about 104 microseconds (μs), and the accurate filter rate is about 0.888%/µs. For the filtering technique of email sender, email subject and message, the correct rate of spam filtering is about 93.45%. The average filtering time of each email is about 1134 microseconds (μs). The accurate filter rate is about 0.827%/µs, which is less than that on spam filtering email sender and subject. In the third research method the email sender, the message subject and the message, with the hyperlinks correctness are used to evaluate the spam filter accuracy and efficiency. The correct rate of spam filtering is about 99.35%. The average filtering time of each email is about 94 microseconds (μs), and the accurate filter rate is about 1.057%/µs. It shows the most effective in spam filtering and one of the best spam filter techniques. In conclusion the best filtering technique is applying the DNS reverse lookup technique at first, then use the blacklist, with the UCE level setup technique. Finally the spam filtering technique of data mining becomes the best filter method by further filtering the email sender, subject, message, 256 attributes and judging the hyperlink correctness. In this research we found some relationship exists in different attributes and message content by using the data mining for spam filtering. The future study is to find the model prediction for the relationship between them. Further the hyperlink technique can be modified to enhance the spam direction for increasing the filtering effect. Chin-Yuan Hsieh 謝金原 2012 學位論文 ; thesis 60 zh-TW