Unsupervised stratification of cross-validation for accuracy estimation

作者：

摘要

The rapid development of new learning algorithms increases the need for improved accuracy estimation methods. Moreover, methods allowing the comparison of several different learning algorithms are important for the performance evaluation of new ones. In this paper we propose new accuracy estimation methods which are extensions of the k-fold cross-validation method. The methods proposed construct cross-validation folds deterministically instead of using the random sampling approach. The deterministic construction of folds is performed using unsupervised stratification by exploiting the distribution of instances in the instance space. Our methods are based either on the one-center approach or on clustering procedures. These methods attempt to construct more representative folds, therefore reducing the bias of the resulting estimator. At the same time, our methods allow direct comparisons between the performance of learning algorithms in different experiments, since no randomness is present. A simulation experiment examining the performance of the proposed methods is reported, depicting their behavior in a variety of situations. The new methods reduce mainly the bias of the estimator.

论文关键词：Machine learning,Inductive learning,Cross-validation,Accuracy estimation,Clustering

论文评审过程：Received 7 October 1997, Revised 25 September 1998, Available online 2 August 2000.

论文官网地址：https://doi.org/10.1016/S0004-3702(99)00094-6