Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

作者:

Highlights:

• Statistical and semantic features based text clustering technique is proposed.

• Combining the statistical and semantic features improves text clustering.

• The formation of lexical chains from WordNet captures important terms.

• Semantic relations such as synonym and hypernym etc. help to reduce dimensionality.

摘要

•Statistical and semantic features based text clustering technique is proposed.•Combining the statistical and semantic features improves text clustering.•The formation of lexical chains from WordNet captures important terms.•Semantic relations such as synonym and hypernym etc. help to reduce dimensionality.

论文关键词:Document clustering,Semantic relations,Lexical chains,TF-IDF,WordNet,Big data

论文评审过程:Received 3 February 2020, Revised 20 January 2021, Accepted 10 February 2021, Available online 18 February 2021, Version of Record 2 March 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.114710