Concept decompositions for short text clustering by identifying word communities

作者:

Highlights:

• A new concept decomposition method WordCom is proposed.

• It creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network.

• It is not only robust to the sparsity of short texts but also overcomes the curse of dimensionality.

• It scaling to a large number of short text inputs due to the concept vectors being obtained from term-term space.

• Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms.

摘要

•A new concept decomposition method WordCom is proposed.•It creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network.•It is not only robust to the sparsity of short texts but also overcomes the curse of dimensionality.•It scaling to a large number of short text inputs due to the concept vectors being obtained from term-term space.•Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms.

论文关键词:Short text clustering,Concept decomposition,Spherical k-means,Semantic word community,Community detection

论文评审过程:Received 11 July 2016, Revised 23 August 2017, Accepted 30 September 2017, Available online 10 October 2017, Version of Record 8 January 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2017.09.045