Optimal outlier removal in high-dimensional spaces

作者:

Highlights:

摘要

We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a β-outlier if there exists some direction w in which its squared distance from the mean along w is greater than β times the average squared distance from the mean along w. Our main theorem is that for any ε>0, there exists a (1−ε) fraction of the original distribution that has no Onεb+lognε-outliers, improving on the previous bound of O(n7b/ε). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a 11−ε approximation to the following optimization problem: given a distribution μ (i.e. the ability to sample from it), and a parameter ε>0, find the minimum β for which there exists a subset of probability at least (1−ε) with no β-outliers.

论文关键词:

论文评审过程:Received 10 September 2001, Revised 6 November 2002, Available online 15 October 2003.

论文官网地址:https://doi.org/10.1016/j.jcss.2003.07.013