Evolving model trees for mining data sets with continuous-valued classes

作者:

Highlights:

摘要

This paper presents a genetic programming (GP) approach to extract symbolic rules from data sets with continuous-valued classes, called GPMCC. The GPMCC makes use of a genetic algorithm (GA) to evolve multi-variate non-linear models [Potgieter, G., & Engelbrecht, A. (2007). Genetic algorithms for the structural optimisation of learned polynomial expressions. Applied Mathematics and Computation] at the terminal nodes of the GP. Several mechanisms have been developed to optimise the GP, including a fragment pool of candidate non-linear models, k-means clustering of the training data to facilitate the use of stratified sampling methods, and specialized mutation and crossover operators to evolve structurally optimal and accurate models. It is shown that the GPMCC is insensitive to control parameter values. Experimental results show that the accuracy of the GPMCC is comparable to that of NeuroLinear and Cubist, while producing significantly less rules with less complex antecedents.

论文关键词:Data mining,Continuous-valued classes,Genetic programming,Model trees

论文评审过程:Available online 11 September 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.08.060