Helmholtz principle based supervised and unsupervised feature selection methods for text mining

作者:

Highlights:

• We propose new supervised and unsupervised feature selection methods, called Meaning Based Feature Selection (MBFS), for feature selection in text classification.

• We adapt and use the meaning measure as a new method for feature selection.

• Meaning measure is based on the Helmholtz principle from the Gestalt theory of human perception.

• MBFS methods are compared with nine different and well-known feature selection methods on six different datasets.

• Experimental results show that MBFS methods are effective feature selection methods and have higher speed than several widely used feature selection methods.

摘要

•We propose new supervised and unsupervised feature selection methods, called Meaning Based Feature Selection (MBFS), for feature selection in text classification.•We adapt and use the meaning measure as a new method for feature selection.•Meaning measure is based on the Helmholtz principle from the Gestalt theory of human perception.•MBFS methods are compared with nine different and well-known feature selection methods on six different datasets.•Experimental results show that MBFS methods are effective feature selection methods and have higher speed than several widely used feature selection methods.

论文关键词:Feature selection,Attribute selection,Machine learning,Text mining,Text classification,Helmholtz principle

论文评审过程:Received 5 December 2014, Revised 10 November 2015, Accepted 31 March 2016, Available online 5 May 2016, Version of Record 22 July 2016.

论文官网地址:https://doi.org/10.1016/j.ipm.2016.03.007