Technical Note: Bias in Information-Based Measures in Decision Tree Induction
作者:Allan P. White, Wei Zhong Liu
摘要
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
论文关键词:Decision trees, noise, induction, unbiased attribute selection, information-based measures
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1022694010754