Learning the set covering machine by bound minimization and margin-sparsity trade-off

作者：François Laviolette, Mario Marchand, Mohak Shah, Sara Shanian

摘要

We investigate classifiers in the sample compression framework that can be specified by two distinct sources of information: a compression set and a message string of additional information. In the compression setting, a reconstruction function specifies a classifier when given this information. We examine how an efficient redistribution of this reconstruction information can lead to more general classifiers. In particular, we derive risk bounds that can provide an explicit control over the sparsity of the classifier and the magnitude of its separating margin and a capability to perform a margin-sparsity trade-off in favor of better classifiers. We show how an application to the set covering machine algorithm results in novel learning strategies. We also show that these risk bounds are tighter than their traditional counterparts such as VC-dimension and Rademacher complexity-based bounds that explicitly take into account the hypothesis class complexity. Finally, we show how these bounds are able to guide the model selection for the set covering machine algorithm enabling it to learn by bound minimization.

论文关键词：Set covering machine, Sample compression, Risk bounds, Margin-sparsity trade-off, Bound minimization

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10994-009-5137-3