Evaluation of clustering and topic modeling methods over health-related tweets and emails
作者:
Highlights:
• Evaluation of topic modeling and clustering on health-related tweets and emails.
• Topic modeling: LSI, LDA, BTM, GibbsLDA, Online LDA, Online Twitter LDA, and GSDMM.
• Clustering: k -means with two feature representations (TF-IDF and Doc2Vec).
• The evaluation is based on two internal and five external cluster validity indices.
摘要
•Evaluation of topic modeling and clustering on health-related tweets and emails.•Topic modeling: LSI, LDA, BTM, GibbsLDA, Online LDA, Online Twitter LDA, and GSDMM.•Clustering: k -means with two feature representations (TF-IDF and Doc2Vec).•The evaluation is based on two internal and five external cluster validity indices.
论文关键词:Topic modeling,Clustering,Internal cluster indices,External cluster indices,Natural language processing
论文评审过程:Received 31 May 2020, Revised 30 March 2021, Accepted 5 May 2021, Available online 7 May 2021, Version of Record 21 May 2021.
论文官网地址:https://doi.org/10.1016/j.artmed.2021.102096