Model validation failure in class imbalance problems

作者:

Highlights:

• Model validation is inherently difficult under class imbalance where minority class is rare in absolute sense.

• Validation performance would misrepresent generalization ability of classification models.

• Random guessing models can yield considerably high validation performance by chance.

• Higher degree of absolute rarity contributes to increased likelihood of model validation failure.

摘要

•Model validation is inherently difficult under class imbalance where minority class is rare in absolute sense.•Validation performance would misrepresent generalization ability of classification models.•Random guessing models can yield considerably high validation performance by chance.•Higher degree of absolute rarity contributes to increased likelihood of model validation failure.

论文关键词:Class imbalance,Model validation,Absolute rarity,Performance evaluation

论文评审过程:Received 13 February 2019, Revised 6 August 2019, Accepted 6 January 2020, Available online 9 January 2020, Version of Record 15 January 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.113190