Text classification method based on self-training and LDA topic models

作者:

Highlights:

• A novel text classification method for learning from very small labeled set.

• The method uses a text representation based on the LDA topic model.

• Self-training is used to enlarge labeled set from unlabeled instances.

• A model for setting methods’ parameters for any document collection is proposed.

摘要

•A novel text classification method for learning from very small labeled set.•The method uses a text representation based on the LDA topic model.•Self-training is used to enlarge labeled set from unlabeled instances.•A model for setting methods’ parameters for any document collection is proposed.

论文关键词:Classification,Topic modeling,LDA,Semi-supervised learning,Self-training

论文评审过程:Received 26 August 2016, Revised 7 March 2017, Accepted 8 March 2017, Available online 8 March 2017, Version of Record 17 March 2017.

论文官网地址:https://doi.org/10.1016/j.eswa.2017.03.020