Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning

作者:

Highlights:

• We design a tabular data GAN for oversampling that can handle categorical variables.

• We assess our GAN in a credit scoring setting using multiple real-world datasets.

• We find GAN-based oversampling to outperform advanced SMOTE-type benchmarks.

• Ablations confirm the specific choices in the proposed GAN architecture.

摘要

•We design a tabular data GAN for oversampling that can handle categorical variables.•We assess our GAN in a credit scoring setting using multiple real-world datasets.•We find GAN-based oversampling to outperform advanced SMOTE-type benchmarks.•Ablations confirm the specific choices in the proposed GAN architecture.

论文关键词:Imbalanced learning,Generative adversarial networks,Credit scoring,Oversampling

论文评审过程:Received 30 September 2020, Revised 14 December 2020, Accepted 5 January 2021, Available online 13 January 2021, Version of Record 4 April 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.114582