Nearest neighbor regression in the presence of bad hubs
作者:
Highlights:
•
摘要
Prediction on a numeric scale, i.e., regression, is one of the most prominent machine learning tasks with various applications in finance, medicine, social and natural sciences. Due to its simplicity, theoretical performance guarantees and successful real-world applications, one of the most popular regression techniques is the k nearest neighbor regression. However, k nearest neighbor approaches are affected by the presence of bad hubs, a recently observed phenomenon according to which some of the instances are similar to surprisingly many other instances and have a detrimental effect on the overall prediction performance. This paper is the first to study bad hubs in context of regression. We propose hubness-aware nearest neighbor regression schemes. We evaluate our approaches on publicly available real-world datasets from various domains. Our results show that the proposed approaches outperform various other regressions schemes such as kNN regression, regression trees and neural networks. We also evaluate the proposed approaches in the presence of label noise because tolerance to noise is one of the most relevant aspects from the point of view of real-world applications. In particular, we perform experiments under the assumption of conventional Gaussian label noise and an adapted version of the recently proposed hubness-proportional random label noise.
论文关键词:Nearest neighbor regression,Hubs,Intrinsic dimensionality,Machine learning
论文评审过程:Received 1 December 2014, Revised 8 June 2015, Accepted 9 June 2015, Available online 16 June 2015, Version of Record 31 July 2015.
论文官网地址:https://doi.org/10.1016/j.knosys.2015.06.010