A genetic algorithm based entity resolution approach with active learning

作者:Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu

摘要

Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing approaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based entity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes’ comparisons with proper thresholds. We use active learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity resolution approaches in accuracy.

论文关键词:entity resolution, genetic algorithm, active learning, data quality, data integration

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-015-5276-6