Evaluation of clustering and topic modeling methods over health-related tweets and emails

作者:

Highlights:

• Evaluation of topic modeling and clustering on health-related tweets and emails.

• Topic modeling: LSI, LDA, BTM, GibbsLDA, Online LDA, Online Twitter LDA, and GSDMM.

• Clustering: k -means with two feature representations (TF-IDF and Doc2Vec).

• The evaluation is based on two internal and five external cluster validity indices.

摘要

•Evaluation of topic modeling and clustering on health-related tweets and emails.•Topic modeling: LSI, LDA, BTM, GibbsLDA, Online LDA, Online Twitter LDA, and GSDMM.•Clustering: k -means with two feature representations (TF-IDF and Doc2Vec).•The evaluation is based on two internal and five external cluster validity indices.

论文关键词:Topic modeling,Clustering,Internal cluster indices,External cluster indices,Natural language processing

论文评审过程:Received 31 May 2020, Revised 30 March 2021, Accepted 5 May 2021, Available online 7 May 2021, Version of Record 21 May 2021.

论文官网地址:https://doi.org/10.1016/j.artmed.2021.102096