Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data
作者:Hongqin Fan, Osmar R. Zaïane, Andrew Foss, Junfeng Wu
摘要
One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.
论文关键词:Outlier Detection, Mining Algorithm, Synthetic Dataset, Engineering Data, Local Outlier
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-008-0145-3