Exploiting the structure of furthest neighbor search for fast approximate results

作者:

Highlights:

摘要

We present a novel strategy for approximate furthest neighbor search that selects a set of candidate points using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by a study of the behavior of the furthest neighbor search problem, which has significantly different structure than the nearest neighbor search problem, and can be understood with the help of an information-theoretic hardness measure that we introduce. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms, even on the hardest datasets, while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org).

论文关键词:Furthest neighbor search,Farthest neighbor search,Hubness and anti-hubness

论文评审过程:Received 14 March 2017, Revised 10 September 2017, Accepted 30 December 2017, Available online 5 January 2018, Version of Record 6 December 2018.

论文官网地址:https://doi.org/10.1016/j.is.2017.12.010