A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring

作者:

Highlights:

摘要

With the financial crisis happened in 2007, massive credit risks are exposed to the banking sectors. So credit scoring has attracted more and more attention. Bank owns a lot of customer data. By using those data, credit scoring model can judge the applicants’ credit risk accurately. But those data are often high dimensional, and have some irrelevant features. Those irrelevant features will affect classifiers accuracy. Therefore, feature selection is an important topic. This paper proposes a two-phase hybrid approach based on filter approach and multiple population genetic algorithm-HMPGA. In phase 1, it introduces the idea of wrapper approach into three filter approaches to acquire some important prior information for initial populations setting of MPGA. In phase 2, it takes advantage of MPGA’s characteristics of global optimization and quick convergence to find optimal feature subset. This paper uses two real credit scoring datasets of UCI databases to compare HMPGA, MPGA and GA. It verifies that the accuracies of feature subsets acquired from HMPGA, MPGA and GA are superior to three filter approaches. Meanwhile, nonparametric Wilcoxon signed rank test is held to confirm that HMPGA is better than MPGA and GA. HMPGA not only can be applied to feature selection of credit scoring, but also can be applied to more fields of data mining.

论文关键词:Credit scoring,Feature selection,Hybrid approach,HMPGA

论文评审过程:Received 2 November 2016, Revised 16 April 2017, Available online 17 May 2017, Version of Record 17 October 2017.

论文官网地址:https://doi.org/10.1016/j.cam.2017.04.036