Post-pruning in regression tree induction: An integrated approach

作者:

Highlights:

摘要

The regression tree (RT) induction process has two major phases: the growth phase and the pruning phase. The pruning phase aims to generalize the RT that was generated in the growth phase by generating a subtree that avoids over-fitting to the training data. Most post-pruning methods essentially address post-pruning as if it were a single objective problem (i.e., maximize validation accuracy), and address the issue of simplicity (in terms of the number of leaves) only in the case of a tie. However, it is well known that apart from accuracy there are other performance measures (e.g., stability, simplicity) that are important for evaluating DT quality. In this paper we present an integrated approach to post-pruning phase that simultaneously accommodates multiple performance measures that are important for evaluating RT quality, and obtains the optimal subtree based on user provided preference and value function information.

论文关键词:Regression tree,Decision tree,Post-pruning,Data mining,Performance measures,Multi-objective programming,Mixed integer programming,Analytic hierarchy process

论文评审过程:Available online 30 January 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.01.017