An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult

作者:

Highlights:

• The problem of class imbalance is exacerbated by various data characteristics.

• Some classifiers cannot conceptually cope with the problem of small disjuncts.

• Resampling approaches can help alleviate issues of class overlap and small disjuncts.

• Class noise can be considered the most disruptive data intrinsic characteristic.

摘要

•The problem of class imbalance is exacerbated by various data characteristics.•Some classifiers cannot conceptually cope with the problem of small disjuncts.•Resampling approaches can help alleviate issues of class overlap and small disjuncts.•Class noise can be considered the most disruptive data intrinsic characteristic.

论文关键词:Classification,Class imbalance,Class overlapping,Data intrinsic characteristics,Noise,Small disjuncts

论文评审过程:Received 17 February 2021, Revised 25 May 2021, Accepted 25 May 2021, Available online 29 May 2021, Version of Record 31 May 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115297