Inference on the prediction of ensembles of infinite size

作者:

Highlights:

摘要

In this paper we introduce a framework for making statistical inference on the asymptotic prediction of parallel classification ensembles. The validity of the analysis is fairly general. It only requires that the individual classifiers are generated in independent executions of some randomized learning algorithm, and that the final ensemble prediction is made via majority voting. Given an unlabeled test instance, the predictions of the classifiers in the ensemble are obtained sequentially. As the individual predictions become known, Bayes' theorem is used to update an estimate of the probability that the class predicted by the current ensemble coincides with the classification of the corresponding ensemble of infinite size. Using this estimate, the voting process can be halted when the confidence on the asymptotic prediction is sufficiently high. An empirical investigation in several benchmark classification problems shows that most of the test instances require querying only a small number of classifiers to converge to the infinite ensemble prediction with a high degree of confidence. For these instances, the difference between the generalization error of the finite ensemble and the infinite ensemble limit is very small, often negligible.

论文关键词:Classification ensembles,Classification trees,Bayesian inference,Infinite ensembles

论文评审过程:Received 21 September 2009, Revised 17 December 2010, Accepted 25 December 2010, Available online 8 January 2011.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.12.021