ADD: a new average divergence difference-based outlier detection method with skewed distribution of data objects

作者:Zhong-Yang Xiong, Qin-Qin Gao, Qiang Gao, Yu-Fang Zhang, Lin-Tao Li, Min Zhang

摘要

Outlier detection is of vital importance in data mining tasks, with numerous applications, including video surveillance and credit card fraud detection. Quite a few outlier detection algorithms have been developed and have received considerable attention, and most existing methods are classified as distance-based algorithms and density-based algorithms. However, both of these approaches have some flaws. The former has difficulty detecting local outliers, and the latter cannot handle low-density pattern problems. Moreover, outlier detection algorithms are sensitive to parameter settings. This paper proposes a simple and efficient outlier detection approach (called ADD) based on the average divergence difference of data objects; in this method there is no need to artificially define the number of neighbors of objects k to solve the above issues. In this algorithm, two new measures, called the divergence factor (DF) and the average divergence difference (LADD), are developed based on the skewed distribution characteristics of data objects and their natural neighbors, thus improving the accuracy of local outlier detection from an innovative research perspective. These factors are presented as external and internal characterization factors because the former characterizes the skew distribution characteristics and compactness relationship of data objects and the latter represents the difference in the skew distribution characteristics of data objects in a neighborhood. Then, we set an appropriate threshold to distinguish whether a data point is an outlier, which eliminates the interference of the Top-N problem. Finally, the final experimental results show that the ADD algorithm achieves an overall improvement in local outlier detection, especially in the detection of outliers in some datasets with complex distributions and in low-density areas, compared to that achieved by state-of-the-art algorithms.

论文关键词:Outlier detection, Divergence factor, Average divergence difference, Data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02399-y