On feature selection protocols for very low-sample-size data
作者:
Highlights:
• Feature selection with very few instances, possibly high-dimensional.
• Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.
• Alternative, proper, protocol includes both steps in a single cross-validation loop.
• Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.
• The proper protocol accuracy is significantly closer to the true accuracy.
摘要
•Feature selection with very few instances, possibly high-dimensional.•Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.•Alternative, proper, protocol includes both steps in a single cross-validation loop.•Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.•The proper protocol accuracy is significantly closer to the true accuracy.
论文关键词:Feature selection,Wide datasets,Experimental protocol,Training/testing,Cross-validation
论文评审过程:Received 29 September 2017, Revised 28 February 2018, Accepted 11 March 2018, Available online 17 March 2018, Version of Record 24 May 2018.
论文官网地址:https://doi.org/10.1016/j.patcog.2018.03.012