Voting over Multiple Condensed Nearest Neighbors

作者：Ethem Alpaydin

摘要

Lazy learning methods like the k-nearest neighbor classifier require storing the whole training set and may be too costly when this set is large. The condensed nearest neighbor classifier incrementally stores a subset of the sample, thus decreasing storage and computation requirements. We propose to train multiple such subsets and take a vote over them, thus combining predictions from a set of concept descriptions. We investigate two voting schemes: simple voting where voters have equal weight and weighted voting where weights depend on classifiers' confidences in their predictions. We consider ways to form such subsets for improved performance: When the training set is small, voting improves performance considerably. If the training set is not small, then voters converge to similar solutions and we do not gain anything by voting. To alleviate this, when the training set is of intermediate size, we use bootstrapping to generate smaller training sets over which we train the voters. When the training set is large, we partition it into smaller, mutually exclusive subsets and then train the voters. Simulation results on six datasets are reported with good results. We give a review of methods for combining multiple learners. The idea of taking a vote over multiple learners can be applied with any type of learning scheme.

论文关键词：lazy learning, nonparametric estimation, k-nearest neighbor, condensed nearest neighbor, voting

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1006563312922