Improving XML schema matching performance using Prüfer sequences

作者:

Highlights:

摘要

Schema matching is a critical step for discovering semantic correspondences among elements in many data-shared applications. Most of existing schema matching algorithms produce scores between schema elements resulting in discovering only simple matches. Such results partially solve the problem. Identifying and discovering complex matches is considered one of the biggest obstacle towards completely solving the schema matching problem. Another obstacle is the scalability of matching algorithms on large number and large-scale schemas. To tackle these challenges, in this paper, we propose a new XML schema matching framework based on the use of Prüfer encoding. In particular, we develop and implement the XPrüM system, which consists mainly of two parts—schema preparation and schema matching. First, we parse XML schemas and represent them internally as schema trees. Prüfer sequences are constructed for each schema tree and employed to construct a sequence representation of schemas. We capture schema tree semantic information in Label Prüfer Sequences (LPS) and schema tree structural information in Number Prüfer Sequences (NPS). Then, we develop a new structural matching algorithm exploiting both LPS and NPS. To cope with complex matching discovery, we introduce the concept of compatible nodes to identify semantic correspondences across complex elements first, then the matching process is refined to identify correspondences among simple elements inside each pair of compatible nodes. Our experimental results demonstrate the performance benefits of the XPrüM system.

论文关键词:XML schema matching,Prufer Sequences,Structural matching,Matching performance

论文评审过程:Received 30 June 2008, Revised 8 January 2009, Accepted 11 January 2009, Available online 22 January 2009.

论文官网地址:https://doi.org/10.1016/j.datak.2009.01.001