Learning Instance Weighted Naive Bayes from labeled and unlabeled data

作者:Liangxiao Jiang

摘要

In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data together with labeled data, has attracted much attention from researchers. In this paper, we propose a very fast and yet highly effective semi-supervised learning algorithm. We call our proposed algorithm Instance Weighted Naive Bayes (simply IWNB). IWNB firstly trains a naive Bayes using the labeled instances only. And the trained naive Bayes is used to estimate the class membership probabilities of the unlabeled instances. Then, the estimated class membership probabilities are used to label and weight unlabeled instances. At last, a naive Bayes is trained again using both the originally labeled data and the (newly labeled and weighted) unlabeled data. Our experimental results based on a large number of UCI data sets show that IWNB often improves the classification accuracy of original naive Bayes when available labeled data are very limited.

论文关键词:Semi-supervised learning, Unlabeled data, Naive Bayes, Instance Weighted Naive Bayes (IWNB), Class membership probability, Classification

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-011-0153-8