Cost-sensitive semi-supervised selective ensemble model for customer credit scoring

作者:

Highlights:

摘要

Only a few customers can be labeled in realistic credit-scoring problems, while many other customers cannot. Further, satisfactory performance is difficult, as traditional supervised learning methods can only use labeled samples to build credit-scoring models. Semi-supervised learning (SSL) can use both labeled and unlabeled samples to solve this problem, but existing credit-scoring research has primarily constructed single semi-supervised models. This study introduces SSL, cost-sensitive learning, a group method of data handling (GMDH), and an ensemble learning technique to propose a GMDH-based cost-sensitive semi-supervised selective ensemble (GCSSE) model. This involves two stages: (1)First, train an ensemble model composed of N base classifiers on the initial training set L with class labels, use it to selectively label the samples from the dataset U without class labels, add them with their predicted labels to the training set, and update the N base classifiers on the new training set; (2)Second, classify L and the test set using the respective trained base classifiers, and construct a cost-sensitive GMDH neural network to obtain the selective ensemble classification results for the test set. Experimental comparisons of five public customer credit score datasets and an empirical analysis of a real customer credit score dataset suggest that this model exhibits the best overall credit-scoring performance compared with one supervised ensemble model and three semi-supervised ensemble models.

论文关键词:Cost-sensitive learning,Credit scoring,Semi-supervised learning,Selective ensemble

论文评审过程:Received 17 January 2019, Revised 5 October 2019, Accepted 10 October 2019, Available online 16 October 2019, Version of Record 16 January 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105118