Entity identification for heterogeneous database integration—a multiple classifier system approach and empirical evaluation

作者:

Highlights:

摘要

Entity identification, i.e., detecting semantically corresponding records from heterogeneous data sources, is a critical step in integrating the data sources. The objective of this research is to develop and evaluate a novel multiple classifier system approach that improves entity identification accuracy. We apply various classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks to determine whether two records from different data sources represent the same real-world entity. We further employ a variety of ways to combine multiple classifiers for improved classification accuracy. In this paper, we report on some promising empirical results that demonstrate performance improvement by combining multiple classifiers.

论文关键词:Heterogeneous database integration,Entity identification,Multiple classifier system

论文评审过程:Received 16 May 2003, Revised 21 October 2003, Accepted 3 November 2003, Available online 16 December 2003.

论文官网地址:https://doi.org/10.1016/j.is.2003.11.001