Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm

作者：

Highlights：

• Distributed Heterogeneous Ensemble is designed for big data classification.

• Classifiers are pruned from the ensemble to increase the diversity.

• A Spark version of DHBoost is presented based on MapReduce programming paradigm.

• DHBoost outperforms the state-of-the-art ensemble classifiers in the Spark library.

摘要

•Distributed Heterogeneous Ensemble is designed for big data classification.•Classifiers are pruned from the ensemble to increase the diversity.•A Spark version of DHBoost is presented based on MapReduce programming paradigm.•DHBoost outperforms the state-of-the-art ensemble classifiers in the Spark library.

论文关键词：Ensemble classifier,Boosting,MapReduce,Big data,Apache Spark,Apache Hadoop

论文评审过程：Received 18 July 2020, Revised 16 April 2021, Accepted 5 June 2021, Available online 11 June 2021, Version of Record 29 June 2021.

论文官网地址：https://doi.org/10.1016/j.eswa.2021.115369