Bitpart: Exact metric search in high(er) dimensions

作者:

Highlights:

摘要

We define BitPart (Bitwise representations of binary Partitions), a novel exact search mechanism intended for use in high-dimensional spaces. In outline, a fixed set of reference objects is used to define a large set of regions within the original space, and each data item is characterised according to its containment within these regions. In contrast with other mechanisms only a subset of this information is selected, according to the query, before a search within the re-cast space is performed. Partial data representations are accessed only if they are known to be potentially useful towards the calculation of the exact query solution.Our mechanism requires Ω(NlogN) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context.In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.

论文关键词:Similarity search,Metric space,Metric indexing,Metric search,Four-point property

论文评审过程:Received 10 September 2019, Revised 9 December 2019, Accepted 7 January 2020, Available online 4 February 2020, Version of Record 15 October 2020.

论文官网地址:https://doi.org/10.1016/j.is.2020.101493