Textual data summarization using the Self-Organized Co-Clustering model
作者:
Highlights:
• The SOCC model is a novel approach for clustering textual data sets.
• It consists in co-clustering the classic document-term frequency matrix, i.e in simultaneously clustering the document and the term they are made of.
• The crossing of a document-cluster and a term-cluster is called block.
• A structure with meaningful and non-meaningful blocks is proposed.
• The resulting co-clustering offers highly interpretable results for the user.
摘要
•The SOCC model is a novel approach for clustering textual data sets.•It consists in co-clustering the classic document-term frequency matrix, i.e in simultaneously clustering the document and the term they are made of.•The crossing of a document-cluster and a term-cluster is called block.•A structure with meaningful and non-meaningful blocks is proposed.•The resulting co-clustering offers highly interpretable results for the user.
论文关键词:Co-Clustering,Document-term matrix,Latent block model
论文评审过程:Received 12 July 2019, Revised 27 January 2020, Accepted 24 February 2020, Available online 29 February 2020, Version of Record 29 February 2020.
论文官网地址:https://doi.org/10.1016/j.patcog.2020.107315