Soft subspace clustering of categorical data with probabilistic distance
作者:
Highlights:
• We define the cluster scatter on object-to-cluster distances for categorical data.
• We propose a probabilistic distance function using a kernel density estimation method.
• Categorical attributes are weighted based on the smoothed dispersion of categories.
• Two weighting schemes are offered depending on the attribute types.
• Significantly improve clustering performance compared to mode-based algorithms.
摘要
Highlights•We define the cluster scatter on object-to-cluster distances for categorical data.•We propose a probabilistic distance function using a kernel density estimation method.•Categorical attributes are weighted based on the smoothed dispersion of categories.•Two weighting schemes are offered depending on the attribute types.•Significantly improve clustering performance compared to mode-based algorithms.
论文关键词:Subspace clustering,Categorical data,Distance measure,Attribute weighting,Kernel density estimation
论文评审过程:Received 9 May 2015, Revised 19 August 2015, Accepted 24 September 2015, Available online 3 October 2015, Version of Record 27 November 2015.
论文官网地址:https://doi.org/10.1016/j.patcog.2015.09.027