CLOVER: a faster prior-free approach to rare-category detection

作者:Hao Huang, Qinming He, Kevin Chiew, Feng Qian, Lianhang Ma

摘要

Rare-category detection helps discover new rare classes in an unlabeled data set by selecting their candidate data examples for labeling. Most of the existing approaches for rare-category detection require prior information about the data set without which they are otherwise not applicable. The prior-free algorithms try to address this problem without prior information about the data set; though, the compensation is high time complexity, which is not lower than \(O(dN^2)\) where \(N\) is the number of data examples in a data set and \(d\) is the data set dimension. In this paper, we propose CLOVER a prior-free algorithm by introducing a novel rare-category criterion known as local variation degree (LVD), which utilizes the characteristics of rare classes for identifying rare-class data examples from other types of data examples and passes those data examples with maximum LVD values to CLOVER for labeling. A remarkable improvement is that CLOVER’s time complexity is \(O(dN^{2-1/d})\) for \(d > 1\) or \(O(N\log N)\) for \(d = 1\). Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of our method in terms of new rare classes discovery and lower time complexity.

论文关键词:Rare-category detection, Local variation degree, \(k\)NN, M\(k\)NN, Histogram density estimation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0530-9