Configurable assembly of classification rules for enhancing entity resolution results

作者:

Highlights:

• We propose four heuristics for turning the results produced by logical rules into confidence scores (i.e., a continuous quantification associated with the confidence produced by the rule regarding the duplicity of an entity pair).

• We propose a novel auto-tuning algorithm for classifying duplicate entities based on confidence scores.

• We propose an efficient algorithm for tuning the parameters of the Rule Assembler (considering scenarios in which training data is available).

• We propose a systematic approach to map user preferences regarding precision and recall into parameters of the Rule Assembler.

• We present an experimental evaluation of the proposed approach using both real-world and synthetic datasets.

摘要

•We propose four heuristics for turning the results produced by logical rules into confidence scores (i.e., a continuous quantification associated with the confidence produced by the rule regarding the duplicity of an entity pair).•We propose a novel auto-tuning algorithm for classifying duplicate entities based on confidence scores.•We propose an efficient algorithm for tuning the parameters of the Rule Assembler (considering scenarios in which training data is available).•We propose a systematic approach to map user preferences regarding precision and recall into parameters of the Rule Assembler.•We present an experimental evaluation of the proposed approach using both real-world and synthetic datasets.

论文关键词:

论文评审过程:Received 19 July 2019, Revised 4 February 2020, Accepted 7 February 2020, Available online 27 February 2020, Version of Record 27 February 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102224