Incorporating domain knowledge into data mining classifiers: An application in indirect lending

作者:

Highlights:

摘要

Data mining techniques have been applied to solve classification problems for a variety of applications such as credit scoring, bankruptcy prediction, insurance underwriting, and management fraud detection. In many of those application domains, there exist human experts whose knowledge could have a bearing on the effectiveness of the classification decision. The lack of research in combining data mining techniques with domain knowledge has prompted researchers to identify the fusion of data mining and knowledge-based expert systems as an important future direction. In this paper, we compare the performance of seven data mining classification methods—naive Bayes, logistic regression, decision tree, decision table, neural network, k-nearest neighbor, and support vector machine—with and without incorporating domain knowledge. The application we focus on is in the domain of indirect bank lending. An expert system capturing a lending expert's knowledge of rating a borrower's credit is used in combination with data mining to study if the incorporation of domain knowledge improves classification performance. We use two performance measures: misclassification cost and AUC (area under the curve). A 2 × 7 factorial, repeated-measures ANOVA, with the two factors being domain knowledge (present or absent) and data mining method (seven methods), as well as a special statistical test for comparing AUCs, is used for analyzing the results. Analysis of the results reveals that incorporation of domain knowledge significantly improves classification performance with respect to both misclassification cost and AUC. There is interaction between classification method and domain knowledge. Incorporation of domain knowledge has a higher influence on performance for some methods than for others. Both measures—misclassification cost and AUC—yield similar results, indicating that the findings of the study are robust.

论文关键词:Data mining,Classification,Supervised learning,Domain knowledge,Expert system

论文评审过程:Received 18 August 2007, Revised 27 March 2008, Accepted 29 June 2008, Available online 16 July 2008.

论文官网地址:https://doi.org/10.1016/j.dss.2008.06.013