A comparative study for content-based dynamic spam classification using four machine learning algorithms

作者:

Highlights:

摘要

The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naïve Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity.

论文关键词:Spam classification,Naïve Bayesian,Neural network,Support vector machine,Relevance vector machine

论文评审过程:Received 11 December 2007, Accepted 15 January 2008, Available online 1 February 2008.

论文官网地址:https://doi.org/10.1016/j.knosys.2008.01.001