Multi-class imbalanced big data classification on Spark
作者:
Highlights:
• First complete framework for learning from multi-class imbalanced big data.
• Informative multi-class sampling methods that use instance-level characteristics.
• Novel oversampling modification dedicated to MapReduce environments.
• Code and data repository for reproducibility and applications of proposed methods.
摘要
•First complete framework for learning from multi-class imbalanced big data.•Informative multi-class sampling methods that use instance-level characteristics.•Novel oversampling modification dedicated to MapReduce environments.•Code and data repository for reproducibility and applications of proposed methods.
论文关键词:Machine learning,Big data,Imbalanced data classification,Multi-class imbalance,Spark,MapReduce
论文评审过程:Received 20 June 2020, Revised 24 September 2020, Accepted 3 November 2020, Available online 7 November 2020, Version of Record 27 November 2020.
论文官网地址:https://doi.org/10.1016/j.knosys.2020.106598