A new node splitting measure for decision tree construction

作者:

Highlights:

摘要

A new node splitting measure termed as distinct class based splitting measure (DCSM) for decision tree induction giving importance to the number of distinct classes in a partition has been proposed in this paper. The measure is composed of the product of two terms. The first term deals with the number of distinct classes in each child partition. As the number of distinct classes in a partition increase, this first term increases and thus Purer partitions are thus preferred. The second term decreases when there are more examples of a class compared to the total number of examples in the partition. The combination thus still favors purer partition. It is shown that the DCSM satisfies two important properties that a split measure should possess viz. convexity and well-behavedness. Results obtained over several datasets indicate that decision trees induced based on the DCSM provide better classification accuracy and are more compact (have fewer nodes) than trees induced using two of the most popular node splitting measures presently in use.

论文关键词:Decision trees,Node splitting measure,Gini Index,Gain Ratio

论文评审过程:Received 8 March 2009, Revised 12 February 2010, Accepted 28 February 2010, Available online 15 March 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.02.025