Online Rebuilding Regression Random Forests

作者:

Highlights:

摘要

Continuous data streams mining is particularly challenging for machine learning. Many efforts have been devoted to propose online learning algorithms that can train iteratively from new coming data and provide evolutionary predictions. Compared to off-line approaches, these algorithms have shown better predictive performance and certain adaptation to high volume continuous data stream. However, a wide range of practical applications calls for regression models that can make adequate use of the large volume of pre-collected training data, meanwhile, handle continuous data stream with multi-type concept drifts, such as abrupt, gradual, incremental, recurring concept drifts. Random Forests(RFs) are an effective ensemble learning model for regression tasks. However, the fixed structure of RFs by off-line training has restricted its applicability for real-world tasks with dynamic data streams. To address these issues, we propose an online rebuilding strategy for the pre-trained Random Forests model, which is called Online Rebuilding Regression Random Forests(ORB-RRF). Specifically, a leaf-pruning technique and online reconstruction of subtrees based on the change of feature space on certain nodes are designed to adjust the local structure of regression tree to adapt to dynamic inputs. Numerical experiments with ORB-RRF show remarkable improvements in the adaptability in data stream and the predictive accuracy in several benchmark real datasets and synthetic datasets, compared to several state-of-art methods. Moreover, we show the convergence and stability of the proposed method.

论文关键词:Online rebuilding,Leaf pruning,Regression Random Forests,Data stream,Concept drift

论文评审过程:Received 10 October 2020, Revised 3 February 2021, Accepted 15 March 2021, Available online 17 March 2021, Version of Record 25 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106960