A comparison of classification methods across different data complexity scenarios and datasets

作者:

Highlights:

• We compare methods for binary classification on synthetic datasets.

• We generate data for four complexity scenarios and with five data characteristics.

• Heterogeneous ensembles perform best on average.

• Nearest shrunken centroids are recommendable for unbalanced training data.

• Bagged CART is recommendable for large training data with low dimensionality.

摘要

•We compare methods for binary classification on synthetic datasets.•We generate data for four complexity scenarios and with five data characteristics.•Heterogeneous ensembles perform best on average.•Nearest shrunken centroids are recommendable for unbalanced training data.•Bagged CART is recommendable for large training data with low dimensionality.

论文关键词:Binary classification,Classification methods,Performance comparison,Data characteristics

论文评审过程:Received 23 October 2018, Revised 6 March 2020, Accepted 1 November 2020, Available online 3 November 2020, Version of Record 24 January 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.114217