Extend tree edit distance for effective object identification

作者:Yue Wang, Hongzhi Wang, Liyan Zhang, Yang Wang, Jianzhong Li, Hong Gao

摘要

Similarity join on XML documents which are usually modeled as rooted ordered labeled trees is widely applied, due to the ambiguity of references to the real-world objects. The conventional method dealing with this issue is based on tree edit distance, which is shortage of flexibility and efficiency. In this paper, we propose two novel edit operations together with extended tree edit distance, which can achieve good performance in similarity matching with hierarchical data structures [the run-time is \(O(n^{3})\) in the worst case]. And then, we propose \(k\)-generation set distance as a good approximation of the tree edit distance to further improve the join efficiency with quadric time complexity. Experiments on real and synthetic databases demonstrate the benefit of our method in efficiency and scalability.

论文关键词:XML data matching, New edit operations, Extended tree edit distance, \(k\)-Generation set distance, Cluster

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-014-0816-1