An investigation of bankruptcy prediction in imbalanced datasets

作者:

Highlights:

• An investigation of bankruptcy prediction in imbalanced datasets is proposed.

• The prediction losses increase as the imbalanced proportion grows more severe.

• Support Vector Machine method is less affected by imbalanced datasets than other prediction method.

• SMOTE outperforms other sampling techniques for all type of prediction models and different training set sizes.

摘要

Previous studies of bankruptcy prediction in imbalanced datasets analyze either the loss of prediction due to data imbalance issues or treatment methods for dealing with this issue. The current article presents a combined investigation of the degree of imbalance, loss of performance, and treatment methods. It determines which imbalanced class distributions jeopardize the performance of bankruptcy prediction methods and identifies the recovery capacities of treatment methods. The results show that an imbalanced distribution, in which the minority class represents 20%, significantly disturbs prediction performance. Furthermore, the support vector machine method is less sensitive than other prediction methods to imbalanced distributions, and sampling methods can recover a satisfactory portion of performance losses. Accordingly, this study provides a better understanding of the data imbalance issue in the field of corporate failure and serves as a methodological guide for designing bankruptcy prediction methods in imbalanced datasets.

论文关键词:C53,G33,Bankruptcy prediction,Imbalanced dataset,Finance

论文评审过程:Received 19 December 2017, Revised 30 May 2018, Accepted 29 June 2018, Available online 2 July 2018, Version of Record 14 July 2018.

论文官网地址:https://doi.org/10.1016/j.dss.2018.06.011