Outlier-eliminated k-means clustering algorithm based on differential privacy preservation

作者:Qingying Yu, Yonglong Luo, Chuanming Chen, Xintao Ding

摘要

Individual privacy may be compromised during the process of mining for valuable information, and the potential for data mining is hindered by the need to preserve privacy. It is well known that k-means clustering algorithms based on differential privacy require preserving privacy while maintaining the availability of clustering. However, it is difficult to balance both aspects in traditional algorithms. In this paper, an outlier-eliminated differential privacy (OEDP) k-means algorithm is proposed that both preserves privacy and improves clustering efficiency. The proposed approach selects the initial centre points in accordance with the distribution density of data points, and adds Laplacian noise to the original data for privacy preservation. Both a theoretical analysis and comparative experiments were conducted. The theoretical analysis shows that the proposed algorithm satisfies ε-differential privacy. Furthermore, the experimental results show that, compared to other methods, the proposed algorithm effectively preserves data privacy and improves the clustering results in terms of accuracy, stability, and availability.

论文关键词:Differential privacy (DP) preservation, k-means clustering, Outlier, OEDP

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-016-0813-z