Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm
作者:Justin C.W. Debuse, Victor J. Rayward-Smith
摘要
An introduction to the approaches used to discretise continuous database features is given, together with a discussion of the potential benefits of such techniques. These benefits are investigated by applying discretisation algorithms to two large commercial databases; the discretisations yielded are then evaluated using a simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements in the speed of the data mining algorithm. However, it is also demonstrated under certain circumstances that the discretisation produced may give an increase in problem size or allow overfitting by the data mining algorithm. Such cases, within which often only a small proportion of the database belongs to the class of interest, highlight the need both for caution when producing discretisations and for the development of more robust discretisation algorithms.
论文关键词:discretisation, data mining, simulated annealing
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1008339026836