Novel artificial bee colony based feature selection method for filtering redundant information

作者:Youwei Wang, Lizhou Feng, Jianming Zhu

摘要

Feature selection, which can reduce the dimensions of feature space without sacrificing the performance of the classifier, is an effective technique for text classification. Because many classifiers cannot deal with the features with high dimensions, filtering the redundant information from the original feature space becomes one of the core goals in feature selection field. In this paper, the concept of equivalence word set is introduced and a set of equivalence word sets (represented as EWS 1) is constructed using the rich semantic information of the Open Directory Project (ODP). On this basis, an artificial bee colony based feature selection method is proposed for filtering the redundant information, and a feature subset FS is obtained by using an optimal feature selection (OFS) method and two predetermined thresholds. In order to obtain the best predetermined thresholds, an improved memory based artificial bee colony method (IABCM) is proposed. In the experiments, fuzzy support vector machine (FSVM) and Naïve Bayesian (NB) classifiers are used on six datasets: LingSpam, WebKB, SpamAssian, 20-Newsgroups, Reuters21578 and TREC2007. Experimental results verify that when FSVM and NB are applied, the proposed method is efficient and achieves better accuracy than several representative feature selection methods.

论文关键词:Feature selection, Text classification, Parameter optimization, Artificial bee colony, Fuzzy support Vector machine

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-017-1010-4