SDAI: An integral evaluation methodology for content-based spam filtering models

作者：

Highlights：

•

摘要

Tragedy of Commons Theory introduced by Hardin (1968) revealed how shared and limited resources get completely depleted as effect of human behaviour. By analogy, common spamming activities can be properly modelled by this solid theory and, consequently, a young Internet Security Industry has recently emerged to fight against spam. However, the massive intensification of spam deliveries during last years has led to the need of achieving a significant improvement in filter accuracy. In this context, current research efforts are mainly focussed on providing a wide variety of content-based techniques able to overcome common spam filtering inconveniencies. Although theoretical filtering evaluation is generally taken into consideration in scientific works, most of the evaluation protocols are not appropriate to correctly assess the performance of models during filter operation in real environments. In order to cover the gap between basic research and applied deployment of well-known spam filtering techniques, this work proposes a novel straightforward evaluation methodology able to rank available models using four different but complementary perspectives: static, dynamic, adaptive and internationalisation. In the present study, we applied our SDAI methodology to compare eight different well-known content-based spam filtering techniques using several established accuracy measures. Results showed the effect of the knowledge grain-size and evidenced several unexpected situations related with the behaviour of analysed models.

论文关键词：Spam classification,Content-based techniques,Model comparison,Evaluation methodology

论文评审过程：Available online 27 April 2012.

论文官网地址：https://doi.org/10.1016/j.eswa.2012.04.064