Summary: | 碩士 === 淡江大學 === 統計學系碩士班 === 99 === In order to prevent spam mails, there are many achievement from the collective efforts of all sectors, although the protections become better and better, the challenges remain.
The study focus on how much information is added in the odel, for this reason we hope to explain the output by meliorated version of input elements.
We use 14 features of sender’s behavior and 20 keywords which calculated to be the most effectiveness by TF-IDF. Besides that, we proposed 24 new variables of semantic component that simulated the habits of writer and considered the expression between
spam e-mail sender and ligitimate e-mail sender. The result shows that simultaneous use of all variables achieve the best results from the point of view of classifiers whatever in C4.5, MLP, or PNN.
|