PUMA: Parallel subspace clustering of categorical data using multi-attribute weights

作者:

Highlights:

• We propose a way of calculating attribute weights through co-occurrence probabilities of attribute values among multiple dimensions.

• We design a subspace clustering algorithm driven by co-occurrence frequencies of multiple attributes of categorical data.

• We implement the two-stage clustering algorithm using the MapReduce programming model.

摘要

•We propose a way of calculating attribute weights through co-occurrence probabilities of attribute values among multiple dimensions.•We design a subspace clustering algorithm driven by co-occurrence frequencies of multiple attributes of categorical data.•We implement the two-stage clustering algorithm using the MapReduce programming model.

论文关键词:Parallel subspace clustering,Multi-attribute weights,High dimension,Categorical data,MapReduce

论文评审过程:Received 6 June 2018, Revised 16 October 2018, Accepted 24 February 2019, Available online 25 February 2019, Version of Record 1 March 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.02.030