Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation
作者:
Highlights:
•
摘要
Efficient clustering algorithms have been developed to automatically group documents into subgroups (clusters). Once clustering has been performed, an important additional step is to help users make sense of the obtained clusters. Existing methods address this issue by assigning to each cluster a flat list of descriptive terms (labels) that are extracted from the documents, most often using statistical techniques borrowed from the field of feature selection or reduction.A limitation of these unstructured descriptions of clusters’ contents is that they do not account for the meaningful relationships between the terms. In contrast, we propose a graph representation, which makes the clusters easier to interpret by putting the descriptive terms in context, and by performing some simple network analysis. Our experiments reveal that the proposed method allows for a deeper level of interpretation, both when the clusters under study are homogeneous and when they are heterogeneous. In addition, evaluation procedures presented in the paper show that the graph-based representation of each cluster, while being very synthetic, still quite faithfully reflects the original content of the cluster.
论文关键词:Cluster labeling,Clustering,Exploratory data analysis,Visualization,Network analysis
论文评审过程:Received 11 February 2013, Revised 6 November 2013, Accepted 7 November 2013, Available online 20 November 2013.
论文官网地址:https://doi.org/10.1016/j.knosys.2013.11.005