Text Classification from Labeled and Unlabeled Documents using EM
作者:Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, Tom Mitchell
摘要
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.
论文关键词:text classification, Expectation-Maximization, integrating supervised and unsupervised learning, combining labeled and unlabeled data, Bayesian learning
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1007692713085