Efficient algorithms for graph regularized PLSA for probabilistic topic modeling
作者:
Highlights:
• We propose efficient algorithms for graph-regularized PLSA (GPLSA) as a general framework for single- or multi- modality topic analysis, where the graph regularizer is based on the divergence between discrete probability distributions. Similarities between topics are enforced in a joint latent space constraint by the graph, and topic distributions are enhanced by their nearest neighbors on the graph.
• We show improved results for the L1 regularizer over the baseline. For L2 divergence as the regularizer, our GPLSA algorithm is more efficient and with a convergence guarantee than the existing method based on graph-regularized non-negative matrix factorization (GNMF). We further describe a new algorithm using symmetric KL divergences as the regularizer, and demonstrate that it is more effective compared to the L2 divergence.
• The proposed algorithms extend naturally probabilistic topic analysis of a single modality to multiple modalities. Our method enables capturing of similarities between documents across modalities, by learning a joint latent space for documents of different modalities. Our topic learning representation leverages the compatible yet complementary conceptual themes among each modality. Thus it is more effective than other methods relying on features derived from direct concatenation of modalities.
摘要
•We propose efficient algorithms for graph-regularized PLSA (GPLSA) as a general framework for single- or multi- modality topic analysis, where the graph regularizer is based on the divergence between discrete probability distributions. Similarities between topics are enforced in a joint latent space constraint by the graph, and topic distributions are enhanced by their nearest neighbors on the graph.•We show improved results for the L1 regularizer over the baseline. For L2 divergence as the regularizer, our GPLSA algorithm is more efficient and with a convergence guarantee than the existing method based on graph-regularized non-negative matrix factorization (GNMF). We further describe a new algorithm using symmetric KL divergences as the regularizer, and demonstrate that it is more effective compared to the L2 divergence.•The proposed algorithms extend naturally probabilistic topic analysis of a single modality to multiple modalities. Our method enables capturing of similarities between documents across modalities, by learning a joint latent space for documents of different modalities. Our topic learning representation leverages the compatible yet complementary conceptual themes among each modality. Thus it is more effective than other methods relying on features derived from direct concatenation of modalities.
论文关键词:Probabilistic latent semantic analysis,Graph regularization,Topic analysis,Clustering
论文评审过程:Received 20 October 2017, Revised 20 July 2018, Accepted 5 September 2018, Available online 20 September 2018, Version of Record 28 September 2018.
论文官网地址:https://doi.org/10.1016/j.patcog.2018.09.004