Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms

作者:

Highlights:

• Ranking of terms based term-weighting scheme proposed.

• Ranking of terms and fuzzy logic with semantic relationship between terms based term-weighting scheme proposed.

• Similar word groups for each document are identified by using Fuzzy C-Means clustering algorithm.

• The traditional K-means and K-means++ clustering algorithms used in Reuters-8 and WebKB data sets.

• To analysis the clustering performance, Accuracy, Entropy, Recall and F-measure are used.

摘要

•Ranking of terms based term-weighting scheme proposed.•Ranking of terms and fuzzy logic with semantic relationship between terms based term-weighting scheme proposed.•Similar word groups for each document are identified by using Fuzzy C-Means clustering algorithm.•The traditional K-means and K-means++ clustering algorithms used in Reuters-8 and WebKB data sets.•To analysis the clustering performance, Accuracy, Entropy, Recall and F-measure are used.

论文关键词:Document representation,Document clustering,Term weighting,F-Measure,Entropy,K-means

论文评审过程:Received 7 November 2018, Revised 5 July 2019, Accepted 10 July 2019, Available online 11 July 2019, Version of Record 18 July 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.07.022