Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods

作者:

Highlights:

摘要

Corporate bankruptcy prediction is very important for creditors and investors. Most literature improves performance of prediction models by developing and optimizing the quantitative methods. This paper investigates the effect of sampling methods on the performance of quantitative bankruptcy prediction models on real highly imbalanced dataset. Seven sampling methods and five quantitative models are tested on two real highly imbalanced datasets. A comparison of model performance tested on random paired sample set and real imbalanced sample set is also conducted. The experimental results suggest that the proper sampling method in developing prediction models is mainly dependent on the number of bankruptcies in the training sample set.

论文关键词:Bankruptcy prediction,Imbalanced dataset,Undersampling,Oversampling,Classification

论文评审过程:Received 9 July 2012, Revised 10 December 2012, Accepted 20 December 2012, Available online 3 January 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2012.12.007