Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

作者:

Highlights:

• We have used semi-supervised learning with the help of frequent itemset and ensemble learning to classify SMS data into ham and spam.

• We have used UCI publicly available SMS spam collection, SMS spam collection corpus v.0.1 small and big data set for experimenting our result.

• We have compared our result with existing semi-supervised learning methods PEBL and SpyEM.

• We have obtained good results on very low amount of positive dataset and different amount of unlabeled dataset.

摘要

•We have used semi-supervised learning with the help of frequent itemset and ensemble learning to classify SMS data into ham and spam.•We have used UCI publicly available SMS spam collection, SMS spam collection corpus v.0.1 small and big data set for experimenting our result.•We have compared our result with existing semi-supervised learning methods PEBL and SpyEM.•We have obtained good results on very low amount of positive dataset and different amount of unlabeled dataset.

论文关键词:Short Message Service (SMS),Ham,Spam,Frequent itemset,Ensemble learning,Semi-supervised classification

论文评审过程:Available online 16 September 2014.

论文官网地址:https://doi.org/10.1016/j.eswa.2014.08.054