Scaling entity resolution: A loosely schema-aware approach
作者:
Highlights:
• A LSH-based attribute-match induction technique to extract loose schema information.
• An unsupervised meta-blocking approach based on loose schema information.
• An algorithm to scale any meta-blocking method on MapReduce-like systems.
• Extensive comparisons with existing meta-blocking methods on 7 real-world datasets.
摘要
•A LSH-based attribute-match induction technique to extract loose schema information.•An unsupervised meta-blocking approach based on loose schema information.•An algorithm to scale any meta-blocking method on MapReduce-like systems.•Extensive comparisons with existing meta-blocking methods on 7 real-world datasets.
论文关键词:00-01,99-00,Entity resolution,Meta-blocking,Big data integration,Data cleaning,Apache Spark
论文评审过程:Received 3 August 2018, Revised 16 February 2019, Accepted 17 March 2019, Available online 21 March 2019, Version of Record 8 April 2019.
论文官网地址:https://doi.org/10.1016/j.is.2019.03.006