On Incremental Learning for Gradient Boosting Decision Trees
作者:Chongsheng Zhang, Yuan Zhang, Xianjin Shi, George Almpanidis, Gaojuan Fan, Xiajiong Shen
摘要
Boosting algorithms, as a class of ensemble learning methods, have become very popular in data classification, owing to their strong theoretical guarantees and outstanding prediction performance. However, most of these boosting algorithms were designed for static data, thus they can not be directly applied to on-line learning and incremental learning. In this paper, we propose a novel algorithm that incrementally updates the classification model built upon gradient boosting decision tree (GBDT), namely iGBDT. The main idea of iGBDT is to incrementally learn a new model but without running GBDT from scratch, when new data is dynamically arriving in batch. We conduct large-scale experiments to validate the effectiveness and efficiency of iGBDT. All the experimental results show that, in terms of model building/updating time, iGBDT obtains significantly better performance than the conventional practice that always runs GBDT from scratch when a new batch of data arrives, while still keeping the same classification accuracy. iGBDT can be used in many applications that require in-time analysis of continuously arriving or real-time user-generated data, such as behaviour targeting, Internet advertising, recommender systems, etc.
论文关键词:Gradient boosting, Gradient boosting decision tree, Incremental learning, Ensemble learning
论文评审过程:
论文官网地址:https://doi.org/10.1007/s11063-019-09999-3