Combining active learning and semi-supervised learning to construct SVM classifier

作者：

Highlights：

•

摘要

One key issue for most classification algorithms is that they need large amounts of labeled samples to train the classifier. Since manual labeling is time consuming, researchers have proposed technologies of active learning and semi-supervised learning to reduce manual labeling workload. There is a certain degree of complementarity between active learning and semi-supervised learning, and therefore some researches combine them to further reduce manual labeling workload. However, researches on combining active learning and semi-supervised learning for SVM classifier are rare. Of numerous SVM active learning algorithms, the most popular is the one that queries the sample closest to the current classification hyperplane in each iteration, which is denoted as SVMAL in this paper. Realizing that SVMAL is only interested in samples that are more likely to be on the class boundary, while ignoring the usage of the rest large amounts of unlabeled samples, this paper designs a semi-supervised learning algorithm to make full use of the rest non-queried samples, and further forms a new active semi-supervised SVM algorithm. The proposed active semi-supervised SVM algorithm uses active learning to select class boundary samples, and semi-supervised learning to select class central samples, for class central samples are believed to better describe the class distribution, and to help SVMAL finding the boundary samples more precisely. In order not to introduce too many labeling errors when exploring class central samples, the label changing rate is used to ensure the reliability of the predicted labels. Experimental results show that the proposed active semi-supervised SVM algorithm performs much better than the pure SVM active learning algorithm, and thus can further reduce manual labeling workload.

论文关键词：Active learning,Semi-supervised learning,Support vector machines,Discriminating speech from non-speech,Label changing rate

论文评审过程：Received 26 February 2012, Revised 18 December 2012, Accepted 30 January 2013, Available online 20 February 2013.

论文官网地址：https://doi.org/10.1016/j.knosys.2013.01.032