On Data and Algorithms: Understanding Inductive Performance

作者:Alexandros Kalousis, João Gama, Melanie Hilario

摘要

In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.

论文关键词:classification, meta-learning, error correlation, classifier ranking, clustering datasets, clustering classifiers

论文评审过程:

论文官网地址:https://doi.org/10.1023/B:MACH.0000015882.38031.85