When is resampling beneficial for feature selection with imbalanced wide data?

作者:

Highlights:

• Wide datasets usually suffer from unbalanced classes distributions.

• Feature selection (FS) is commonly recommended for wide datasets.

• We aim to find the best combination and order to apply FS and resampling.

• 14 datasets, 5 classifiers, 7 FS, and 7 balancing strategies were tested.

• The best configuration was SVM-RFE used before RUS for the SVM-G classifier.

摘要

•Wide datasets usually suffer from unbalanced classes distributions.•Feature selection (FS) is commonly recommended for wide datasets.•We aim to find the best combination and order to apply FS and resampling.•14 datasets, 5 classifiers, 7 FS, and 7 balancing strategies were tested.•The best configuration was SVM-RFE used before RUS for the SVM-G classifier.

论文关键词:Feature selection,Wide data,High dimensional data,Very low sample size,Unbalanced,Machine learning

论文评审过程:Received 30 April 2021, Revised 30 July 2021, Accepted 30 September 2021, Available online 15 October 2021, Version of Record 20 October 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.116015