A class boundary preserving algorithm for data condensation

作者:

Highlights:

摘要

In instance-based machine learning, algorithms often suffer from storing large numbers of training instances. This results in large computer memory usage, long response time, and often oversensitivity to noise. In order to overcome such problems, various instance reduction algorithms have been developed to remove noisy and surplus instances. This paper discusses existing algorithms in the field of instance selection and abstraction, and introduces a new approach, the Class Boundary Preserving Algorithm (CBP), which is a multi-stage method for pruning the training set, based on a simple but very effective heuristic for instance removal. CBP is tested with a large number of datasets and comparatively evaluated against eight of the most successful instance-based condensation algorithms. Experiments showed that our algorithm achieved similar classification accuracies, with much improved storage reduction and competitive execution speeds.

论文关键词:Machine learning,Instance based learning,Instance condensation

论文评审过程:Received 30 January 2010, Revised 18 June 2010, Accepted 8 August 2010, Available online 13 August 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.08.014