Locally discriminative topic modeling
作者:
Highlights:
•
摘要
Topic modeling is a powerful tool for discovering the underlying or hidden structure in text corpora. Typical algorithms for topic modeling include probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA). Despite their different inspirations, both approaches are instances of generative model, whereas the discriminative structure of the documents is ignored. In this paper, we propose locally discriminative topic model (LDTM), a novel topic modeling approach which considers both generative and discriminative structures of the data space. Different from PLSA and LDA in which the topic distribution of a document is dependent on all the other documents, LDTM takes a local perspective that the topic distribution of each document is strongly dependent on its neighbors. By modeling the local relationships of documents within each neighborhood via a local linear model, we learn topic distributions that vary smoothly along the geodesics of the data manifold, and can better capture the discriminative structure in the data. The experimental results on text clustering and web page categorization demonstrate the effectiveness of our proposed approach.
论文关键词:Topic modeling,Generative,Discriminative,Local learning
论文评审过程:Received 25 June 2010, Revised 25 January 2011, Accepted 29 April 2011, Available online 20 May 2011.
论文官网地址:https://doi.org/10.1016/j.patcog.2011.04.029