Hashing-based clustering in high dimensional data
作者:
Highlights:
• We modify hashing strategies to cluster high dimensional documents.
• We estimate the Jaccard similarity by counting bucket collisions between documents.
• We introduce a penalized Hamming function to approximate the cosine similarity.
• Both strategies allow improving the quality of the detected clusters.
摘要
•We modify hashing strategies to cluster high dimensional documents.•We estimate the Jaccard similarity by counting bucket collisions between documents.•We introduce a penalized Hamming function to approximate the cosine similarity.•Both strategies allow improving the quality of the detected clusters.
论文关键词:Locality sensitive hashing,High dimensional clustering,Min-wise hashing,Random hyperplanes
论文评审过程:Received 13 April 2015, Revised 6 June 2016, Accepted 7 June 2016, Available online 16 June 2016, Version of Record 23 June 2016.
论文官网地址:https://doi.org/10.1016/j.eswa.2016.06.008