The Effect of Instance-Space Partition on Significance

作者:Jeffrey P. Bradford, Carla E. Brodley

摘要

This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.

论文关键词:classification, comparative studies, statistical tests of significance, cross validation

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007613918580