Bias in information-based measures in decision tree induction
作者:Allan P. White, Wei Zhong Liu
摘要
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
论文关键词:Decision trees, noise, induction, unbiased attribute selection, information-based measures
论文评审过程:
论文官网地址:https://doi.org/10.1007/BF00993349