Partially collapsed Gibbs sampling for latent Dirichlet allocation

作者:

Highlights:

• We propose an enhanced Latent Dirichlet Allocation (LDA) model in text mining.

• We identify latent topics among heterogeneous collections of discrete data.

• We address highly multimodal latent topic distributions as well as unimodal ones.

• We obtain unbiased estimates under flexible modeling for text corpora.

• We use the method of partial collapse and the Dirichlet process mixtures.

摘要

•We propose an enhanced Latent Dirichlet Allocation (LDA) model in text mining.•We identify latent topics among heterogeneous collections of discrete data.•We address highly multimodal latent topic distributions as well as unimodal ones.•We obtain unbiased estimates under flexible modeling for text corpora.•We use the method of partial collapse and the Dirichlet process mixtures.

论文关键词:Bayesian analysis,Latent Dirichlet allocation,Dirichlet process mixture,Partial collapse,Machine learning,Natural language processing

论文评审过程:Received 12 September 2018, Revised 14 April 2019, Accepted 14 April 2019, Available online 17 April 2019, Version of Record 2 May 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.04.028