Storing data once in M-trees and PM-trees: Revisiting the building principles of metric access methods

作者：

Highlights：

•

摘要

Since the introduction of the M-tree, a fundamental tree-based data structure for indexing multi-dimensional information, several structural enhancements have been proposed. One of the most effective ones is the use of additional global pivots that resulted in the PM-tree. These two indexing structures, however, can store the same data element in multiple nodes. In this article, we revisit both the M-tree and the PM-tree to propose a new construction algorithm that stores data elements only once in the tree hierarchies. The main challenge to accomplish this, is to properly select data elements when an inner node split is needed. To address it, we propose an approach based on the use of aggregate nearest neighbor queries. The new algorithms enable building the search result set as data elements are evaluated for pruning during traversal, allowing faster retrieval of k-nearest neighbors and range searches. We conducted an extensive set of experiments with different real datasets. The results show that our proposed algorithms have considerably superior performance when compared with the standard M-tree and PM-tree.

论文关键词：Metric access methods,Ball-partitioning indexing,M-tree,PM-tree,k-nearest neighbor query,Range query

论文评审过程：Received 19 February 2020, Revised 3 September 2021, Accepted 13 September 2021, Available online 25 September 2021, Version of Record 1 October 2021.

论文官网地址：https://doi.org/10.1016/j.is.2021.101896