Efficient mining of class association rules with the itemset constraint

作者:

Highlights:

摘要

Mining class association rules (CARs) with the itemset constraint is concerned with the discovery of rules, which contain a set of specific items in the rule antecedent and a class label in the rule consequent. This task is commonly encountered in mining medical data. For example, when classifying which section of the population is at high risk for the HIV infection, epidemiologists often concentrate on rules which include demographic information such as gender, age, and marital status in the rule antecedent, and HIV-Positive in the rule consequent. There are two naive strategies to solve this problem, namely pre-processing and post-processing. The post-processing methods have to generate and consider a huge number of candidate CARs while the performance of the pre-processing methods depend on the number of records filtered out. Therefore, such approaches are time consuming. This study proposes an efficient method for mining CARs with the itemset constraint based on a lattice structure and the difference between two sets of object identifiers (diffset). Firstly, a lattice structure is built to store all frequent itemsets in the dataset. To reduce memory usage, instead of the entire set of object identifiers, the diffset is used. Secondly, the lattice is traversed to generate only rules which satisfy the itemset constraint. The experimental results show that the proposed algorithm outperforms existing methods in terms of both the mining time and memory usage.

论文关键词:Associative classification,Class association rule,Data mining,Useful rules

论文评审过程:Received 31 August 2015, Revised 23 March 2016, Accepted 25 March 2016, Available online 14 April 2016, Version of Record 5 May 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.03.025