Using Hellinger distance in a nearest neighbour classifier for relational databases

作者:

Highlights:

摘要

Nearest neighbour algorithms classify a previously unseen input case by finding similar cases to make predictions about the unknown features of the input case. The usefulness of the nearest neighbour algorithms has been demonstrated in many real-world domains. Unfortunately, most of the similarity measures discussed in the current nearest neighbour learning literature handle only limited data types, thus limiting their applicability to relational database applications.In this paper, we propose an enhanced nearest neighbour learning algorithm that is applicable to relational databases. The proposed method allows one to define similarity on a wide spectrum of attribute types. It automatically assigns to each attribute a weight of its importance with respect to the target attribute. The method has been implemented as a computer program and its effectiveness has been tested on four publicly available machine learning databases. Its performance is compared to another well-known machine learning method, C4.5. Our experimentation with the system demonstrates that the classification accuracy of the proposed system was superior to that of C4.5 in most cases.

论文关键词:Machine learning,Nearest neighbour algorithm,Classification

论文评审过程:Received 22 July 1997, Accepted 10 August 1999, Available online 3 November 1999.

论文官网地址:https://doi.org/10.1016/S0950-7051(99)00041-6