An effective ensemble pruning algorithm based on frequent patterns

作者:

Highlights:

摘要

Ensemble pruning is crucial for the consideration of both predictive accuracy and predictive efficiency. Previous ensemble methods demand vast memory spaces and heavy computational burdens in dealing with large-scale datasets, which leads to the inefficiency for the problem of classification. To address the issue, this paper proposes a novel ensemble pruning algorithm based on the mining of frequent patterns called EP-FP. The method maps the dataset and pruned ensemble to a transactional database in which each transaction corresponds to an instance and each item corresponds to a base classifier. Moreover, a Boolean matrix called as the classification matrix is used to compress the classification resulted by pruned ensemble on the dataset. Henceforth, we transform the problem of ensemble pruning to the mining of frequent base classifiers on the classification matrix. Several candidate ensembles are obtained through extracting base classifiers with better performance iteratively and incrementally. Finally, we determine the final ensemble according to a designed evaluation function. The comparative experiments have demonstrated the effectiveness and validity of EP-FP algorithm for the classification of large-scale datasets.

论文关键词:Ensemble pruning,Frequent pattern,Large-scale dataset,Transactional database,Boolean matrix

论文评审过程:Received 30 January 2013, Revised 29 October 2013, Accepted 30 October 2013, Available online 9 November 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.10.024