Using the doubling dimension to analyze the generalization of learning algorithms

作者:

Highlights:

摘要

Given a set F of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor.We then prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of F and strengthens the best known bound in terms of the VC-dimension alone.Finally, we show that there is no bound on the doubling dimension of halfspaces in Rn in terms of n that holds independently of the domain distribution. This implies that there is no such a bound in terms of the VC-dimension of F (in contrast with the metric dimension).

论文关键词:PAC learning,Generalization,Doubling dimension,Doubling metric,Learning theory,Statistical learning theory,Local complexity

论文评审过程:Received 17 October 2007, Revised 23 January 2009, Available online 31 January 2009.

论文官网地址:https://doi.org/10.1016/j.jcss.2009.01.003