Matching large schemas: Approaches and evaluation

作者:

Highlights:

摘要

Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.

论文关键词:Schema matching,Schema matching evaluation,Data integration,Schema integration,Model management

论文评审过程:Received 7 April 2005, Revised 18 April 2006, Accepted 26 September 2006, Available online 24 October 2006.

论文官网地址:https://doi.org/10.1016/j.is.2006.09.002