Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies

This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at netw...

Full description

Bibliographic Details
Main Author:	Marsono, Muhammad Nadzir
Other Authors:	Gebali, Fayez
Language:	English en
Published:	2007
Subjects:	Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering
Online Access:	http://hdl.handle.net/1828/209

id	ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-209
record_format	oai_dc
collection	NDLTD
language	English en
sources	NDLTD
topic	Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering
spellingShingle	Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering Marsono, Muhammad Nadzir Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
description	This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation. In our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation. In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7. In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mails than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In our fourth contribution, we propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loadings and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In this dissertation, we present four techniques to improve spam control based on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet.
author2	Gebali, Fayez
author_facet	Gebali, Fayez Marsono, Muhammad Nadzir
author	Marsono, Muhammad Nadzir
author_sort	Marsono, Muhammad Nadzir
title	Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
title_short	Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
title_full	Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
title_fullStr	Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
title_full_unstemmed	Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
title_sort	towards improving e-mail content classification for spam control: architecture, abstraction, and strategies
publishDate	2007
url	http://hdl.handle.net/1828/209
work_keys_str_mv	AT marsonomuhammadnadzir towardsimprovingemailcontentclassificationforspamcontrolarchitectureabstractionandstrategies
_version_	1716728919943544832
spelling	ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-2092015-01-29T16:50:22Z Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies Marsono, Muhammad Nadzir Gebali, Fayez El-Kharashi, M. Watheq Network security Spam control UVic Subject Index::Sciences and Engineering::Engineering::Electrical engineering This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation. In our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation. In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7. In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mails than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In our fourth contribution, we propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loadings and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In this dissertation, we present four techniques to improve spam control based on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet. 2007-08-28T22:56:36Z 2007-08-28T22:56:36Z 2007 2007-08-28T22:56:36Z Thesis http://hdl.handle.net/1828/209 M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Performance analysis of server-side spam control strategies based on layer-3 classification,” in Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE 2007), Vancouver, BC, Canada, April 2007, pp. 349–352. M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Rejecting spam at SMTP sessions,” in IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing (PACRIM 2007), Victoria, BC, Canada, August 2007, pp. 236–239. M. N. Marsono, M. W. El-Kharashi, and F. Gebali, “Binary LNS-based na¨ıve Bayes hardware classifier for spam control,” in Proceedings of the 2006 IEEE International Symposium on Circuit and System (ISCAS 2006), Island of Kos, Greece, May 2006, pp. 3674–3677. M. N. Marsono, M. W. El-Kharashi, F. Gebali, and S. Ganti, “A distributed e-mail classification for spam control,” in Proceedings of the 2006 Canadian Conference on Electrical and Computer Engineering (CCECE 2006), Ottawa, ON, Canada, May 2006, pp. 438–441. English en Available to the World Wide Web

Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies

Similar Items