Mining interestingness measures for string pattern mining

作者:

Highlights:

摘要

A novel method of detecting interesting patterns in strings is presented. A common way to refine the results of pattern mining algorithms is by using interestingness measures. However, the set of appropriate measures differs for each domain and problem. The aim of our research was to develop a model with which to classify patterns according to their interestingness. The method is based on the application of machine learning algorithms to a dataset generated from factor features. Each dataset row is associated with a factor of a string and contains values for different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle, which improves the classification results obtained. With the proposed method, experts need not configure the parameters to obtain interesting patterns. We demonstrate the utility of the method by presenting an example of the results for real data. The datasets and scripts required to reproduce the experiments are available on-line.

论文关键词:String mining,Association rules,Interestingness measures,Pattern mining,Data mining

论文评审过程:Available online 4 February 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.01.013