What is this Cluster about? Explaining textual clusters by extracting relevant keywords

作者:

Highlights:

• We propose a score-based, knowledge-based, and (semi) supervised method to explaining text clusters.

• We show how to use external knowledge to expand score-based explanations using an ILP model.

• The ILP model with external knowledge can control the diversity and consistency of explanations.

• In the semi-supervised approach, we have only 9% drop in our metrics by reducing labels by 70%.

• We propose a modification of the current evaluation metrics to reduce bias towards common labels.

摘要

•We propose a score-based, knowledge-based, and (semi) supervised method to explaining text clusters.•We show how to use external knowledge to expand score-based explanations using an ILP model.•The ILP model with external knowledge can control the diversity and consistency of explanations.•In the semi-supervised approach, we have only 9% drop in our metrics by reducing labels by 70%.•We propose a modification of the current evaluation metrics to reduce bias towards common labels.

论文关键词:Document clustering,Text analytics,Explainability,Cluster summarization,Cluster labelling,Clusters explanations

论文评审过程:Received 3 July 2020, Revised 9 October 2020, Accepted 23 July 2021, Available online 4 August 2021, Version of Record 12 August 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107342