Forum latent Dirichlet allocation for user interest discovery

作者:

Highlights:

摘要

The popularity of online forums provides a good opportunity to learn user interests which can be used in many business scenarios, such as product or news recommendation. There exist many approaches to infer forum topics and users’ interests. Among them, Author-Topic (AT) like models are most popular. But a thread in online forum is composed of a root post and some response posts which may be relevant or irrelevant to the root post. So the assumption of AT that response posts are generated from user’s interest topics is not comprehensive. In this paper, we distinguish user’s serious and unserious interest topics and argue that the topic of a relevant response post is jointly determined by its author’s serious interest topics and the topics of its root post, while the topic of irrelevant response post is only determined by its author’s unserious interest topics. Based on these assumptions, we propose Forum-LDA to model the generative process of root post, relevant and irrelevant response posts jointly. Therefore, our model can not only learn more coherent topics and serious interests, but also identify unserious users who publish many irrelevant posts. Extensive experiments on real forum dataset demonstrate the advantages of our model in tasks such as user interest and unserious user discovery.

论文关键词:User interest,Topic model,Forum content analysis

论文评审过程:Received 27 November 2016, Revised 12 April 2017, Accepted 13 April 2017, Available online 14 April 2017, Version of Record 2 May 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.04.006