Unsupervised Latent Dirichlet Allocation for supervised question classification

作者:

Highlights:

• We introduce a new classification algorithm based on LDA topic models.

• The model benefits from a topic-categories mixture model in addition to the original LDA.

• The proposed model can be used for classifying text to categories/domains.

• A set of 2800 Persian questions and a set of 1000 German questions from CQA websites have been annotated for our experiments.

• Our model was compared to the state-of-the-art classification algorithms.

摘要

•We introduce a new classification algorithm based on LDA topic models.•The model benefits from a topic-categories mixture model in addition to the original LDA.•The proposed model can be used for classifying text to categories/domains.•A set of 2800 Persian questions and a set of 1000 German questions from CQA websites have been annotated for our experiments.•Our model was compared to the state-of-the-art classification algorithms.

论文关键词:Community-based QA,Question classification,LDA

论文评审过程:Received 24 August 2015, Revised 21 August 2017, Accepted 4 January 2018, Available online 4 February 2018, Version of Record 4 February 2018.

论文官网地址:https://doi.org/10.1016/j.ipm.2018.01.001