Error estimation based on variance analysis of k-fold cross-validation

作者:

Highlights:

• When the numbers of samples and folds are both large enough, we proved that CV variance and its accuracy have the quadratic relationship.

• The relationships between CV variance and its factors have been derived, allowing to predict which variance is less before applying k-fold CV.

• Theoretical explanations have been given for some empirical evidences of Rodriguez and Kohavi from the respect of variance analysis.

• The proposed normalized variance has significant correlation with the error and is unrelated to k so that it can serve as a stable error measurement.

摘要

•When the numbers of samples and folds are both large enough, we proved that CV variance and its accuracy have the quadratic relationship.•The relationships between CV variance and its factors have been derived, allowing to predict which variance is less before applying k-fold CV.•Theoretical explanations have been given for some empirical evidences of Rodriguez and Kohavi from the respect of variance analysis.•The proposed normalized variance has significant correlation with the error and is unrelated to k so that it can serve as a stable error measurement.

论文关键词:Error estimation,k-fold cross-validation,Variance analysis,Model selection

论文评审过程:Received 25 July 2016, Revised 24 February 2017, Accepted 22 March 2017, Available online 14 April 2017, Version of Record 22 April 2017.

论文官网地址:https://doi.org/10.1016/j.patcog.2017.03.025