Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification

作者:Fang Zhou, Suting Gao, Lyu Ni, Martin Pavlovski, Qiwen Dong, Zoran Obradovic, Weining Qian

摘要

Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an effective meta-framework for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble), which (1) leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and (2) assigns proper instance weights to address the issues of noisy data. Furthermore, most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS. The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvements in F1-score when compared to a broad range of published models.

论文关键词:Dynamic self-paced sampling, Highly class imbalance, Class-overlapped data

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-022-00838-z