A comparison of outlier detection algorithms for ITS data

作者:

Highlights:

摘要

In order to improve the veracity and reliability of a traffic model built, or to extract important and valuable information from collected traffic data, the technique of outlier mining has been introduced into the traffic engineering domain for detecting and analyzing the outliers in traffic data sets. Three typical outlier algorithms, respectively the statistics-based approach, the distance-based approach, and the density-based local outlier approach, are described with respect to the principle, the characteristics and the time complexity of the algorithms. A comparison among the three algorithms is made through application to intelligent transportation systems (ITS). Two traffic data sets with different dimensions have been used in our experiments carried out, one is travel time data, and the other is traffic flow data. We conducted a number of experiments to recognize outliers hidden in the data sets before building the travel time prediction model and the traffic flow foundation diagram. In addition, some artificial generated outliers are introduced into the traffic flow data to see how well the different algorithms detect them. Three strategies-based on ensemble learning, partition and average LOF have been proposed to develop a better outlier recognizer. The experimental results reveal that these methods of outlier mining are feasible and valid to detect outliers in traffic data sets, and have a good potential for use in the domain of traffic engineering. The comparison and analysis presented in this paper are expected to provide some insights to practitioners who plan to use outlier mining for ITS data.

论文关键词:Outlier detection,Traffic data,Statistics-based,Distance-based,Density-based

论文评审过程:Available online 24 June 2009.

论文官网地址:https://doi.org/10.1016/j.eswa.2009.06.008