MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

作者:

Highlights:

摘要

Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFI-UDSM is proposed to mine the minimal infrequent itemsets (MiFIs) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of “item cap” and “support cap”. In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase.

论文关键词:Outlier detection,Minimal infrequent itemset mining,Uncertain data stream,Deviation indices,Data mining

论文评审过程:Received 13 April 2019, Revised 20 November 2019, Accepted 24 November 2019, Available online 27 November 2019, Version of Record 8 February 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105268