Study of Hellinger Distance as a splitting metric for Random Forests in balanced and imbalanced classification datasets

作者:

Highlights:

• Hellinger Distance (HD) is a robust splitting metric for Random Forests (RF).

• HD is statistically better for imbalanced datasets with respect to AUC.

• HD is statistically better for two-class datasets with respect to Brier score.

• The combination of HD and Gini improves the robustness of RF.

摘要

•Hellinger Distance (HD) is a robust splitting metric for Random Forests (RF).•HD is statistically better for imbalanced datasets with respect to AUC.•HD is statistically better for two-class datasets with respect to Brier score.•The combination of HD and Gini improves the robustness of RF.

论文关键词:Hellinger Distance,Imbalanced problems,Random Forests

论文评审过程:Received 12 July 2019, Revised 31 October 2019, Accepted 30 January 2020, Available online 31 January 2020, Version of Record 7 February 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.113264