Learning object identification rules for information integration

作者:

Highlights:

摘要

When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects. This manual process is time consuming and error-prone. In our approach. Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain. The experimental results demonstrate that we achieve higher accuracy and require less user involvement than previous methods across various application domains.

论文关键词:Information integration,Machine learning,Data cleaning,Record linkage,Object identification,Active learning

论文评审过程:Available online 6 September 2001.

论文官网地址:https://doi.org/10.1016/S0306-4379(01)00042-4