Multi-class imbalanced big data classification on Spark

作者:

Highlights:

• First complete framework for learning from multi-class imbalanced big data.

• Informative multi-class sampling methods that use instance-level characteristics.

• Novel oversampling modification dedicated to MapReduce environments.

• Code and data repository for reproducibility and applications of proposed methods.

摘要

•First complete framework for learning from multi-class imbalanced big data.•Informative multi-class sampling methods that use instance-level characteristics.•Novel oversampling modification dedicated to MapReduce environments.•Code and data repository for reproducibility and applications of proposed methods.

论文关键词:Machine learning,Big data,Imbalanced data classification,Multi-class imbalance,Spark,MapReduce

论文评审过程:Received 20 June 2020, Revised 24 September 2020, Accepted 3 November 2020, Available online 7 November 2020, Version of Record 27 November 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106598