A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

作者:Junnan Li, Qingsheng Zhu

摘要

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting self-training framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor.

论文关键词:Semi-supervised learning (SSL), Semi-supervised classification (SSC), Self-training, Boosting, Instance generation, Natural neighbors

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01732-1