High average-utility sequential pattern mining based on uncertain databases

作者:Jerry Chun-Wei Lin, Ting Li, Matin Pirouz, Ji Zhang, Philippe Fournier-Viger

摘要

The emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.

论文关键词:High average-utility sequential pattern mining, Sequential patterns, Uncertain database, Data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-019-01385-8