Large scale instance matching via multiple indexes and candidate selection
作者:
Highlights:
•
摘要
Instance matching aims to discover the linkage between different descriptions of real objects across heterogeneous data sources. With the rapid development of Semantic Web, especially of the linked data, automatically instance matching has been become the fundamental issue for ontological data sharing and integration. Instances in the ontologies are often in large scale, which contains millions of, or even hundreds of millions objects. Directly applying previous schema level ontology matching methods is infeasible. In this paper, we systematically investigate the characteristics of instance matching, and then propose a scalable and efficient instance matching approach named VMI. VMI generates multiple vectors for different kinds of intained in the ontology instances, and uses a set of inverted indexes based rules to get the primary matching candidates. Then it employs user customized property values to further eliminate the incorrect matchings. Finally the similarities of matching candidates are computed as the integrated vector distances and the matching results are extracted. Experiments on instance track from OAEI 2009 and OAEI 2010 show that the proposed method achieves better effectiveness and efficiency (a speedup of more than 100 times and a bit better performance (+3.0% to 5.0% in terms of F1-score) than top performer RiMOM on most of the datasets). Experiments on Linked MDB and DBpedia show that VMI can obtain comparable results with the SILK system (about 26,000 results with good quality).
论文关键词:Heterogeneous data,Semantic web,Instance matching,Ontology matching,Linked data
论文评审过程:Received 16 November 2011, Revised 30 May 2013, Accepted 6 June 2013, Available online 18 June 2013.
论文官网地址:https://doi.org/10.1016/j.knosys.2013.06.004