On using prototype reduction schemes to optimize dissimilarity-based classification

摘要

The aim of this paper is to present a strategy by which a new philosophy for pattern classification, namely that pertaining to dissimilarity-based classifiers (DBCs), can be efficiently implemented. This methodology, proposed by Duin and his co-authors (see Refs. [Experiments with a featureless approach to pattern recognition, Pattern Recognition Lett. 18 (1997) 1159–1166; Relational discriminant analysis, Pattern Recognition Lett. 20 (1999) 1175–1181; Dissimilarity representations allow for buillding good classifiers, Pattern Recognition Lett. 23 (2002) 943–956; Dissimilarity representations in pattern recognition, Concepts, theory and applications, Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2005; Prototype selection for dissimilarity-based classifiers, Pattern Recognition 39 (2006) 189–208]), is a way of defining classifiers between the classes, and is not based on the feature measurements of the individual patterns, but rather on a suitable dissimilarity measure between them. The advantage of this methodology is that since it does not operate on the class-conditional distributions, the accuracy can exceed the Bayes’ error bound. The problem with this strategy is, however, the need to compute, store and process the inter-pattern dissimilarities for all the training samples, and thus, the accuracy of the classifier designed in the dissimilarity space is dependent on the methods used to achieve this. In this paper, we suggest a novel strategy to enhance the computation for all families of DBCs. Rather than compute, store and process the DBC based on the entire data set, we advocate that the training set be first reduced into a smaller representative subset. Also, rather than determine this subset on the basis of random selection, or clustering, etc., we advocate the use of a prototype reduction scheme (PRS), whose output yields the points to be utilized by the DBC. The rationale for this is explained in the paper. Apart from utilizing PRSs, in the paper we also propose simultaneously employing the Mahalanobis distance as the dissimilarity-measurement criterion to increase the DBCs classification accuracy. Our experimental results demonstrate that the proposed mechanism increases the classification accuracy when compared with the “conventional” approaches for samples involving real-life as well as artificial data sets—even though the resulting dissimilarity criterion is not symmetric.