Automatically generating data linkages using class-based discriminative properties

作者:

Highlights:

摘要

A challenge for Linked Data is to link instances from different data sources that denote the same real-world object. Millions of high-quality owl:sameAs linkages have been generated, but potential ones are still considerable. Traditional similarity-based methods to this data linkage problem do not scale well since they exhaustively compare every pair of instances. In this paper, we propose an automatic approach to data linkage generation for Linked Data. Specifically, a highly-accurate training set is automatically generated based on equivalence reasoning and common prefix blocking. The contexts of the instances in the training set, after extracting, are pairwise matched in order to learn discriminative property pairs supporting linkage discovery. For a particular class pair and a pay-level-domain pair, the discriminability of each property pair is measured, and a few property pairs with high discriminability are aggregated in order to be reused in the future to link instances between the same classes and domains. The experimental results show that our approach achieves good accuracy against some complex methods in two OAEI tests and the BTC2011 dataset.

论文关键词:Data linkage,Instance matching,Object coreference resolution,Semantic Web,Discriminability,Ontologies

论文评审过程:Received 17 June 2013, Revised 1 March 2014, Accepted 4 March 2014, Available online 27 March 2014.

论文官网地址:https://doi.org/10.1016/j.datak.2014.03.001