An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining

作者:

Highlights:

摘要

Discovering frequent patterns in large databases is one of the most studied problems in data mining, since it can yield substantial commercial benefits. However, some sensitive patterns with security considerations may compromise privacy. In this paper, we aim to determine appropriate balance between need for privacy and information discovery in frequent patterns. A novel method to modify databases for hiding sensitive patterns is proposed in this paper. Multiplying the original database by a sanitization matrix yields a sanitized database with private content. In addition, two probabilities are introduced to oppose against the recovery of sensitive patterns and to reduce the degree of hiding non-sensitive patterns in the sanitized database. The complexity analysis and the security discussion of the proposed sanitization process are provided. The results from a series of experiments performed to show the efficiency and effectiveness of this approach are described.

论文关键词:Data mining,Frequent pattern,Sensitive pattern,Sanitization process,Probability policy

论文评审过程:Received 31 August 2007, Accepted 10 December 2007, Available online 12 January 2008.

论文官网地址:https://doi.org/10.1016/j.datak.2007.12.005