Handling data irregularities in classification: Foundations, trends, and future challenges

作者:

Highlights:

• Data irregularities can significantly degrade the performance of classifiers.

• We present a comprehensive taxonomy and survey of various data irregularities.

• We discuss prominent methods to handle distribution and feature-based irregularities.

• We highlight the co-occurrences and interrelations among different irregularities.

• We unearth a number of promising future research avenues.

摘要

•Data irregularities can significantly degrade the performance of classifiers.•We present a comprehensive taxonomy and survey of various data irregularities.•We discuss prominent methods to handle distribution and feature-based irregularities.•We highlight the co-occurrences and interrelations among different irregularities.•We unearth a number of promising future research avenues.

论文关键词:Data irregularities,Class imbalance,Small disjuncts,Class-distribution skew,Missing features,Absent features

论文评审过程:Received 4 October 2017, Revised 16 January 2018, Accepted 4 March 2018, Available online 14 March 2018, Version of Record 24 May 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2018.03.008