Document clustering method using dimension reduction and support vector clustering to overcome sparseness

作者:

Highlights:

• This study proposes new method to overcome sparsity problem of document clustering.

• We build combined method using dimension reduction, K-means clustering, and SVC.

• In particular, we attempt to overcome the sparseness in patent document clustering.

• First, we conduct experiment using news data from UCI machine learning repository.

• Second, using retrieved patent documents, we carry out patent clustering.

摘要

•This study proposes new method to overcome sparsity problem of document clustering.•We build combined method using dimension reduction, K-means clustering, and SVC.•In particular, we attempt to overcome the sparseness in patent document clustering.•First, we conduct experiment using news data from UCI machine learning repository.•Second, using retrieved patent documents, we carry out patent clustering.

论文关键词:Document clustering,Sparseness problem,Patent clustering,Dimension reduction,K-means clustering based on support vector clustering,Silhouette measure

论文评审过程:Available online 22 November 2013.

论文官网地址:https://doi.org/10.1016/j.eswa.2013.11.018