A comparative evaluation of aggregation methods for machine learning over vertically partitioned data

作者:

Highlights:

• We compare aggregation methods for vertically partitioned data in several scenarios.

• Impact of datasets characteristics over aggregators’ performance is investigated.

• Silhouette and imbalance coefficient are the most influential characteristics.

• Characteristics impact varies according to the specific scenario.

• Decision and regression trees are trained to guide the aggregator choice.

摘要

•We compare aggregation methods for vertically partitioned data in several scenarios.•Impact of datasets characteristics over aggregators’ performance is investigated.•Silhouette and imbalance coefficient are the most influential characteristics.•Characteristics impact varies according to the specific scenario.•Decision and regression trees are trained to guide the aggregator choice.

论文关键词:Vertical data partitioning,Distributed machine learning,Classification,Predictions aggregation,Attribute-partitioned data

论文评审过程:Received 6 August 2019, Revised 2 March 2020, Accepted 23 March 2020, Available online 4 April 2020, Version of Record 15 April 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.113406