When is resampling beneficial for feature selection with imbalanced wide data?
作者:
Highlights:
• Wide datasets usually suffer from unbalanced classes distributions.
• Feature selection (FS) is commonly recommended for wide datasets.
• We aim to find the best combination and order to apply FS and resampling.
• 14 datasets, 5 classifiers, 7 FS, and 7 balancing strategies were tested.
• The best configuration was SVM-RFE used before RUS for the SVM-G classifier.
摘要
•Wide datasets usually suffer from unbalanced classes distributions.•Feature selection (FS) is commonly recommended for wide datasets.•We aim to find the best combination and order to apply FS and resampling.•14 datasets, 5 classifiers, 7 FS, and 7 balancing strategies were tested.•The best configuration was SVM-RFE used before RUS for the SVM-G classifier.
论文关键词:Feature selection,Wide data,High dimensional data,Very low sample size,Unbalanced,Machine learning
论文评审过程:Received 30 April 2021, Revised 30 July 2021, Accepted 30 September 2021, Available online 15 October 2021, Version of Record 20 October 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2021.116015