Can classification performance be predicted by complexity measures? A study using microarray data

作者:L. Morán-Fernández, V. Bolón-Canedo, A. Alonso-Betanzos

摘要

Data complexity analysis enables an understanding of whether classification performance could be affected, not by algorithm limitations, but by intrinsic data characteristics. Microarray datasets based on high numbers of gene expressions combined with small sample sizes represent a particular challenge for machine learning researchers. This type of data also has other particularities that may negatively affect the generalization capacity of classifiers, such as overlaps between classes and class imbalance. Making use of several complexity measures, we analyzed the intrinsic complexity of several microarray datasets with and without feature selection and then explored the connection with the empirical results obtained by four widely used classifiers. Experimental results for 21 binary and multiclass datasets demonstrate that a correlation exists between microarray data complexity and the classification error rates.

论文关键词:Data complexity measures, Classification, Microarray data, Feature selection, Filters

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-016-1003-3