Characterizing the scale dimension of a high-dimensional classification problem

作者:

Highlights:

摘要

Classification of high-dimensional data is inherently difficult. We present an exploratory data analysis methodology for characterizing the scale dimension of a classification problem. The idea is to characterize the support of one distinguished target class as a collection of balls covering the class, with each ball centered at an observation in that class such that the radius is maximal without containing observations from the other classes. The scale dimension is defined to be the number of distinct radii (ball sizes) required to cover the class without covering observations from the other class. A greedy algorithm is used to fit the balls. The balls then provide a description of the support of the target class, with information about the complexity of the classification problem implicit in the number, radii, adjacency and position of the balls. Clustering the balls by radius and pruning the cluster tree yields an estimate of the scale dimension for the problem. We illustrate the methodology with pedagogical simulations and a chemical sensor data analysis application.

论文关键词:Scale dimension,Classification,Exploratory data analysis,Interpoint distance,Reduced kernel estimator,Random graph,Class cover,High-dimensional data,Artificial nose

论文评审过程:Received 18 January 2001, Revised 16 October 2001, Accepted 17 December 2001, Available online 17 February 2006.

论文官网地址:https://doi.org/10.1016/S0031-3203(02)00042-0