Fast data-oriented microaggregation algorithm for large numerical datasets

作者:

Highlights:

摘要

Microaggregation is a successful mechanism to solve the tension between respondent privacy and data quality in the context of Statistical Disclosure Control. Microaggregation, for numerical datasets, is defined as a clustering problem with the constraint of having at least k records in each group, such that the sum of the within-group squared error (SSE) is minimized. Unfortunately, the data publisher has to execute an algorithm iteratively for different values of k to investigate a good trade-off between privacy and utility. Multiple execution of an algorithm on large numerical datasets is resource wasting, since most of the computations are repetitive. In this paper, we propose a Fast Data-oriented Microaggregation algorithm (FDM) that efficiently anonymizes large multivariate numerical datasets for multiple successive values of k. Experimental results on real world datasets demonstrate the superiority of the method in terms of both the data quality and time complexity. Moreover, the method usually achieves a better trade-off between disclosure risk and information loss of the protected dataset in comparison with previous techniques.

论文关键词:Privacy,Microaggregation,k-Anonymity,Statistical Disclosure Control,Savings heuristic

论文评审过程:Received 17 February 2014, Revised 5 May 2014, Accepted 6 May 2014, Available online 21 May 2014.

论文官网地址:https://doi.org/10.1016/j.knosys.2014.05.011