Effective pruning for XML structural match queries

作者:

Highlights:

摘要

Extensible Markup Language (XML) is becoming the de facto standard for exchanging information over the Internet, which results in the proliferation of XML documents. This has led to increased interest in this area by the research community. One of the main challenges is processing large collections of XML documents efficiently. Most current methods suffer from two drawbacks: an inability to complement each other to further enhance query processing performance without modifying the existing query processing engine; and an incapability of being customized for different structural and usage characteristics. This paper presents a new approach for structural query processing called Property-Driven Pruning Algorithm (PDPA), which offers the twin features of structural query processing independence and plug-and-play properties to overcome both drawbacks. PDPA consists of two phases: the offline and the online phase. During the offline phase, a list of pruning properties is added into the original XML documents. During the online phase, the input queries are modified with a list of carefully selected properties which are used during query processing to quickly prune non-matching candidate documents. We have proposed an exhaustive and a greedy heuristic algorithm. The experimental results based on both algorithms demonstrate that PDPA can improve XML query processing performance in a variety of situations by up to twofold.

论文关键词:XML query processing,Pruning,Semi-structured data,Structural match queries

论文评审过程:Received 27 October 2008, Revised 11 February 2010, Accepted 11 February 2010, Available online 4 March 2010.

论文官网地址:https://doi.org/10.1016/j.datak.2010.02.004