Text Classification from Labeled and Unlabeled Documents using EM

作者:Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, Tom Mitchell

摘要

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.

论文关键词:text classification, Expectation-Maximization, integrating supervised and unsupervised learning, combining labeled and unlabeled data, Bayesian learning

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007692713085