Effects of feature construction on classification performance: An empirical study in bank failure prediction

作者:

Highlights:

摘要

While extensive research in data mining has been devoted to developing better classification algorithms, relatively little research has been conducted to examine the effects of feature construction, guided by domain knowledge, on classification performance. However, in many application domains, domain knowledge can be used to construct higher-level features to potentially improve performance. For example, past research and regulatory practice in early warning of bank failures has resulted in various explanatory variables, in the form of financial ratios, that are constructed based on bank accounting variables and are believed to be more effective than the original variables in identifying potential problem banks. In this study, we empirically compare the performance of two sets of classifiers for bank failure prediction, one built using raw accounting variables and the other built using constructed financial ratios. Four popular data mining methods are used to learn the classifiers: logistic regression, decision tree, neural network, and k-nearest neighbor. We evaluate the classifiers on the basis of expected misclassification cost under a wide range of possible settings. The results of the study strongly indicate that feature construction, guided by domain knowledge, significantly improves classifier performance and that the degree of improvement varies significantly across the methods.

论文关键词:Data mining,Classification,Feature construction,Bank failure prediction,Financial ratios

论文评审过程:Available online 14 February 2008.

论文官网地址:https://doi.org/10.1016/j.eswa.2008.01.053