Soft subspace clustering of categorical data with probabilistic distance

作者:

Highlights:

• We define the cluster scatter on object-to-cluster distances for categorical data.

• We propose a probabilistic distance function using a kernel density estimation method.

• Categorical attributes are weighted based on the smoothed dispersion of categories.

• Two weighting schemes are offered depending on the attribute types.

• Significantly improve clustering performance compared to mode-based algorithms.

摘要

Highlights•We define the cluster scatter on object-to-cluster distances for categorical data.•We propose a probabilistic distance function using a kernel density estimation method.•Categorical attributes are weighted based on the smoothed dispersion of categories.•Two weighting schemes are offered depending on the attribute types.•Significantly improve clustering performance compared to mode-based algorithms.

论文关键词:Subspace clustering,Categorical data,Distance measure,Attribute weighting,Kernel density estimation

论文评审过程:Received 9 May 2015, Revised 19 August 2015, Accepted 24 September 2015, Available online 3 October 2015, Version of Record 27 November 2015.

论文官网地址:https://doi.org/10.1016/j.patcog.2015.09.027