A comparison of random forest variable selection methods for classification prediction modeling

作者:

Highlights:

• We compare performance for random forest variable selection methods.

• VSURF or Jiang's method are preferable for most datasets.

• varSelRF or Boruta perform well for data with >50 predictors.

• Methods with conditional random forest usually have similar performance.

• Type of methods, test- or performance-based, is not likely to impact performance.

摘要

•We compare performance for random forest variable selection methods.•VSURF or Jiang's method are preferable for most datasets.•varSelRF or Boruta perform well for data with >50 predictors.•Methods with conditional random forest usually have similar performance.•Type of methods, test- or performance-based, is not likely to impact performance.

论文关键词:Random forest,Variable selection,Feature reduction,Classification

论文评审过程:Received 11 October 2018, Revised 21 May 2019, Accepted 22 May 2019, Available online 23 May 2019, Version of Record 6 June 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.05.028