On feature selection protocols for very low-sample-size data

作者:

Highlights:

• Feature selection with very few instances, possibly high-dimensional.

• Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.

• Alternative, proper, protocol includes both steps in a single cross-validation loop.

• Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.

• The proper protocol accuracy is significantly closer to the true accuracy.

摘要

•Feature selection with very few instances, possibly high-dimensional.•Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.•Alternative, proper, protocol includes both steps in a single cross-validation loop.•Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.•The proper protocol accuracy is significantly closer to the true accuracy.

论文关键词:Feature selection,Wide datasets,Experimental protocol,Training/testing,Cross-validation

论文评审过程:Received 29 September 2017, Revised 28 February 2018, Accepted 11 March 2018, Available online 17 March 2018, Version of Record 24 May 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2018.03.012