Scaling entity resolution: A loosely schema-aware approach

作者:

Highlights:

• A LSH-based attribute-match induction technique to extract loose schema information.

• An unsupervised meta-blocking approach based on loose schema information.

• An algorithm to scale any meta-blocking method on MapReduce-like systems.

• Extensive comparisons with existing meta-blocking methods on 7 real-world datasets.

摘要

•A LSH-based attribute-match induction technique to extract loose schema information.•An unsupervised meta-blocking approach based on loose schema information.•An algorithm to scale any meta-blocking method on MapReduce-like systems.•Extensive comparisons with existing meta-blocking methods on 7 real-world datasets.

论文关键词:00-01,99-00,Entity resolution,Meta-blocking,Big data integration,Data cleaning,Apache Spark

论文评审过程:Received 3 August 2018, Revised 16 February 2019, Accepted 17 March 2019, Available online 21 March 2019, Version of Record 8 April 2019.

论文官网地址:https://doi.org/10.1016/j.is.2019.03.006