PUMA: Parallel subspace clustering of categorical data using multi-attribute weights
作者:
Highlights:
• We propose a way of calculating attribute weights through co-occurrence probabilities of attribute values among multiple dimensions.
• We design a subspace clustering algorithm driven by co-occurrence frequencies of multiple attributes of categorical data.
• We implement the two-stage clustering algorithm using the MapReduce programming model.
摘要
•We propose a way of calculating attribute weights through co-occurrence probabilities of attribute values among multiple dimensions.•We design a subspace clustering algorithm driven by co-occurrence frequencies of multiple attributes of categorical data.•We implement the two-stage clustering algorithm using the MapReduce programming model.
论文关键词:Parallel subspace clustering,Multi-attribute weights,High dimension,Categorical data,MapReduce
论文评审过程:Received 6 June 2018, Revised 16 October 2018, Accepted 24 February 2019, Available online 25 February 2019, Version of Record 1 March 2019.
论文官网地址:https://doi.org/10.1016/j.eswa.2019.02.030