Minmax Circular Sector Arc for External Plagiarism’s Heuristic Retrieval stage
作者:
Highlights:
• Locality-sensitive hashing algorithms for nearest search problem are proposed.
• The algorithms represent sketches of documents as unique numeric values.
• The algorithms reduce hashing and retrieval time by 50% and 33%, respectively.
• The number of permutations should be strictly controlled to obtain the desired recalls.
摘要
•Locality-sensitive hashing algorithms for nearest search problem are proposed.•The algorithms represent sketches of documents as unique numeric values.•The algorithms reduce hashing and retrieval time by 50% and 33%, respectively.•The number of permutations should be strictly controlled to obtain the desired recalls.
论文关键词:External Plagiarism,Heuristic Retrieval,Locality-sensitive hashing,High-dimensional spaces,Pattern clustering,Approximate nearest neighbor search,Hashing method,Hashing time reduction,Min–max hash method,Pairwise Jaccard similarity estimation,Scalable similarity search,Approximation algorithms,Computational efficiency,Nearest neighbor searches,Jaccard similarity
论文评审过程:Received 4 January 2017, Revised 28 June 2017, Accepted 12 August 2017, Available online 18 August 2017, Version of Record 18 October 2017.
论文官网地址:https://doi.org/10.1016/j.knosys.2017.08.013