A genetic approach for efficient outlier detection in projected space
作者:
Highlights:
•
摘要
In this paper we present a genetic solution to the outlier detection problem. The essential idea behind this technique is to define outliers by examining those projections of the data, along which the data points have abnormal or inconsistent behavior (defined in terms of their sparsity values). We use a partitioning method to divide the data set into groups such that all the objects in a group can be considered to behave similarly. We then identify those groups that contain outliers. The algorithm assigns an ‘outlier-ness’ value that gives a relative measure of how strong an outlier group is. An evolutionary search computation technique is employed for determining those projections of the data over which the outliers can be identified. A new data structure, called the grid count tree (GCT), is used for efficient computation of the sparsity factor. GCT helps in quickly determining the number of points within any grid defined over the projected space and hence facilitates faster computation of the sparsity factor. A new crossover is also defined for this purpose. The proposed method is applicable for both numeric and categorical attributes. The search complexity of the GCT traversal algorithm is provided. Results are demonstrated for both artificial and real life data sets including four gene expression data sets.
论文关键词:Deviation detection,Gene expression,Genetic algorithm,Grid count tree,Projected dimension,Outlier
论文评审过程:Received 26 April 2006, Revised 27 September 2007, Accepted 4 October 2007, Available online 11 October 2007.
论文官网地址:https://doi.org/10.1016/j.patcog.2007.10.003