Subspace-based outlier detection using linear programming and heuristic techniques
作者:
Highlights:
•
摘要
A useful strategy to perform outlier detection (OD) in highdimensional data, especially in the presence of multiple classes of outliers, is to decompose the outlier detection problem into a set of relevant subspace selection (RSS) problems. In this approach, one relevant subspace is selected locally for each data point, and then the outlierness of that data point within its relevant subspaces is evaluated. The k-nearest neighbor algorithm has been used in most local-based subspace selection methods for OD application in high dimensional data. However, such techniques suffer from the curse of dimensionality, exhibit high computational complexity, and also, it may not be suitable for arbitrary distributions of the data. Additionally, the subspace nearest neighbor search (SNNS) problem in high dimensional data is rarely discussed in these methods. In this paper, we formulate the local selection of relevant subspace method for OD problem that employs a linear programming (LP) model with the objective function of local sparsity maximization. Our formulation for local selection of relevant subspace as an LP also considers the subspace nearest neighbor search problem to simultaneously solve these two subproblems (local selection of relevant subspace and subspace nearest neighbor search) that are tightly coupled. In addition, a novel heuristic algorithm, with linear complexity with respect to the number of dimensions, is developed to solve the LP model through an iterative procedure that alternates between two algorithms of subspace nearest neighbor search and RSS. Experimental results on both synthetic and real datasets show the feasibility of our formulation, the effectiveness of the heuristic algorithm, and also show the superiority of our proposed subspace outlier detection algorithm based on linear programming and heuristic techniques (SODLPH) to other published studies in terms of detection accuracy. Assessing by the average of the AUC on seven real-world UCI datasets, the performance of our proposed method has improved at least 8.72% in comparison with other related published works.
论文关键词:Outlier detection,High dimensional data,Subspace selection,Subspace nearest neighbor search,Linear programming model,Heuristic algorithm
论文评审过程:Received 17 October 2021, Revised 16 June 2022, Accepted 21 June 2022, Available online 30 June 2022, Version of Record 8 July 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117955