Privacy preserving decision tree learning over multiple parties

作者:

Highlights:

摘要

Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

论文关键词:Data mining,ID3,Data privacy and security

论文评审过程:Received 18 January 2006, Revised 14 April 2006, Accepted 22 February 2007, Available online 28 March 2007.

论文官网地址:https://doi.org/10.1016/j.datak.2007.02.004