Handling imbalance in hierarchical classification problems using local classifiers approaches

作者:Rodolfo M. Pereira, Yandre M. G. Costa, Carlos N. Silla Jr.

摘要

The task of learning from imbalanced datasets has been widely investigated in the binary, multi-class and multi-label classification scenarios. Although this problem also affects hierarchical datasets, there are few work in the literature dealing with it. Meanwhile, the local classifier approaches are the most used techniques in the literature to deal with Hierarchical Classification problems. In this paper, we present new ways to handle data imbalance in hierarchical classification problems when using local classifiers approaches. We propose three different resampling schemas, according to the local classification approach: (1) Local Classifiers per Node; (2) Local Classifiers per Parent Node; and (3) Local Classifiers per Level. In order to define how imbalanced a certain hierarchical dataset is, we also propose three novel metrics to measure the imbalance in hierarchical datasets considering the different local classification approaches. The experimental evaluation in eight well-known datasets showed that the imbalance metrics can indeed measure the datasets imbalance and the proposed resampling schemas are able to improve the classification results when compared to baselines, state-of-the-art and related work approaches.

论文关键词:Hierarchical datasets, Local classifiers algorithms, Class imbalance, Measuring imbalance

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-021-00762-8