CMRules: Mining sequential rules common to several sequences
作者:
Highlights:
•
摘要
Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent.
论文关键词:Sequential rule mining,Sequential pattern mining,Association rule mining,Sequence database,Educational data mining
论文评审过程:Available online 20 July 2011.
论文官网地址:https://doi.org/10.1016/j.knosys.2011.07.005