A K-partitioning algorithm for clustering large-scale spatio-textual data
作者:
Highlights:
• The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning.
• A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance.
• Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality.
摘要
Highlights•The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning.•A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance.•Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality.
论文关键词:Spatio-textual similarity,K-means clustering,K-medoids clustering,K-prototypes clustering,Expected distance,Grid partitioning
论文评审过程:Received 10 December 2015, Revised 24 June 2016, Accepted 8 August 2016, Available online 28 September 2016, Version of Record 15 October 2016.
论文官网地址:https://doi.org/10.1016/j.is.2016.08.003