Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at netw...
Main Author: | |
---|---|
Other Authors: | |
Language: | English en |
Published: |
2007
|
Subjects: | |
Online Access: | http://hdl.handle.net/1828/209 |
id |
ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-209 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English en |
sources |
NDLTD |
topic |
Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering |
spellingShingle |
Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering Marsono, Muhammad Nadzir Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
description |
This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation.
In our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation.
In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for
effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction
level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7.
In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mails than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.
In our fourth contribution, we propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loadings and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.
In this dissertation, we present four techniques to improve spam control based on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet. |
author2 |
Gebali, Fayez |
author_facet |
Gebali, Fayez Marsono, Muhammad Nadzir |
author |
Marsono, Muhammad Nadzir |
author_sort |
Marsono, Muhammad Nadzir |
title |
Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
title_short |
Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
title_full |
Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
title_fullStr |
Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
title_full_unstemmed |
Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
title_sort |
towards improving e-mail content classification for spam control: architecture, abstraction, and strategies |
publishDate |
2007 |
url |
http://hdl.handle.net/1828/209 |
work_keys_str_mv |
AT marsonomuhammadnadzir towardsimprovingemailcontentclassificationforspamcontrolarchitectureabstractionandstrategies |
_version_ |
1716728919943544832 |
spelling |
ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-2092015-01-29T16:50:22Z Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies Marsono, Muhammad Nadzir Gebali, Fayez El-Kharashi, M. Watheq Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation. In our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation. In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7. In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mails than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In our fourth contribution, we propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loadings and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In this dissertation, we present four techniques to improve spam control based on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet. 2007-08-28T22:56:36Z 2007-08-28T22:56:36Z 2007 2007-08-28T22:56:36Z Thesis http://hdl.handle.net/1828/209 M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Performance analysis of server-side spam control strategies based on layer-3 classification,” in Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE 2007), Vancouver, BC, Canada, April 2007, pp. 349–352. M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Rejecting spam at SMTP sessions,” in IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing (PACRIM 2007), Victoria, BC, Canada, August 2007, pp. 236–239. M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Binary LNS-based na¨ıve Bayes hardware classifier for spam control,” in Proceedings of the 2006 IEEE International Symposium on Circuit and System (ISCAS 2006), Island of Kos, Greece, May 2006, pp. 3674–3677. M. N. Marsono, M. W. El-Kharashi, F. Gebali, and S. Ganti, “A distributed e-mail classification for spam control,” in Proceedings of the 2006 Canadian Conference on Electrical and Computer Engineering (CCECE 2006), Ottawa, ON, Canada, May 2006, pp. 438–441. English en Available to the World Wide Web |