Evaluation of k-Nearest Neighbor classifier performance for direct marketing

作者：

Highlights：

•

摘要

Text data mining is a process of exploratory data analysis. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. This paper describes the proposed k-Nearest Neighbor classifier that performs comparative cross-validation for the existing k-Nearest Neighbor classifier. The feasibility and the benefits of the proposed approach are demonstrated by means of data mining problem: direct marketing. Direct marketing has become an important application field of data mining. Comparative cross-validation involves estimation of accuracy by either stratified k-fold cross-validation or equivalent repeated random subsampling. While the proposed method may have a high bias; its performance (accuracy estimation in our case) may be poor due to a high variance. Thus the accuracy with the proposed k-Nearest Neighbor classifier was less than that with the existing k-Nearest Neighbor classifier, and the smaller the improvement in runtime the larger the improvement in precision and recall. In our proposed method we have determined the classification accuracy and prediction accuracy where the prediction accuracy is comparatively high.

论文关键词：Data mining,Cross-validation,k-Nearest Neighbor,Runtime,Accuracy

论文评审过程：Available online 9 May 2009.

论文官网地址：https://doi.org/10.1016/j.eswa.2009.04.055