Textual data summarization using the Self-Organized Co-Clustering model

作者:

Highlights:

• The SOCC model is a novel approach for clustering textual data sets.

• It consists in co-clustering the classic document-term frequency matrix, i.e in simultaneously clustering the document and the term they are made of.

• The crossing of a document-cluster and a term-cluster is called block.

• A structure with meaningful and non-meaningful blocks is proposed.

• The resulting co-clustering offers highly interpretable results for the user.

摘要

•The SOCC model is a novel approach for clustering textual data sets.•It consists in co-clustering the classic document-term frequency matrix, i.e in simultaneously clustering the document and the term they are made of.•The crossing of a document-cluster and a term-cluster is called block.•A structure with meaningful and non-meaningful blocks is proposed.•The resulting co-clustering offers highly interpretable results for the user.

论文关键词:Co-Clustering,Document-term matrix,Latent block model

论文评审过程:Received 12 July 2019, Revised 27 January 2020, Accepted 24 February 2020, Available online 29 February 2020, Version of Record 29 February 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107315