A K-partitioning algorithm for clustering large-scale spatio-textual data

作者：

Highlights：

• The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning.

• A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance.

• Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality.

摘要

Highlights•The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning.•A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance.•Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality.

论文关键词：Spatio-textual similarity,K-means clustering,K-medoids clustering,K-prototypes clustering,Expected distance,Grid partitioning

论文评审过程：Received 10 December 2015, Revised 24 June 2016, Accepted 8 August 2016, Available online 28 September 2016, Version of Record 15 October 2016.

论文官网地址：https://doi.org/10.1016/j.is.2016.08.003