Improving short text classification by learning vector representations of both words and hidden topics

作者:

Highlights:

• We exploit the knowledge from a topic-consistent corpus for topic modeling and use the topics to enrich the corpus and the short texts.

• We learn the vector representations of both words and topics interactively on the enriched corpus.

• We use the vectors of the words and topics to represent the features of short texts for training and classification.

• Our method performs better than many baselines.

摘要

•We exploit the knowledge from a topic-consistent corpus for topic modeling and use the topics to enrich the corpus and the short texts.•We learn the vector representations of both words and topics interactively on the enriched corpus.•We use the vectors of the words and topics to represent the features of short texts for training and classification.•Our method performs better than many baselines.

论文关键词:Short texts,Topic model,Data enrich,Word and topic vectors

论文评审过程:Received 13 September 2015, Revised 25 March 2016, Accepted 27 March 2016, Available online 30 March 2016, Version of Record 23 April 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.03.027