Information based data anonymization for classification utility

作者:

Highlights:

摘要

Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. We make use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm.

论文关键词:Privacy,Anonymization,k-anonymity,Classification,Mutual information,Kullback–Leibler divergence

论文评审过程:Received 27 September 2010, Revised 10 April 2011, Accepted 5 July 2011, Available online 22 July 2011.

论文官网地址:https://doi.org/10.1016/j.datak.2011.07.001