Process mining on noisy logs — Can log sanitization help to improve performance?

作者：

摘要

Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.

论文关键词：Process mining,Benchmarking,Noisy data,Log sanitization,Metrics,Rules

论文评审过程：Received 4 November 2014, Revised 10 August 2015, Accepted 12 August 2015, Available online 21 August 2015, Version of Record 9 September 2015.

论文官网地址：https://doi.org/10.1016/j.dss.2015.08.003