Induction of multiclass multifeature split decision trees from distributed data
作者:
Highlights:
•
摘要
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one such method that generates compact trees using multifeature splits in place of single feature split decision trees generated by most existing methods for distributed data. Our method is based on Fisher's linear discriminant function, and is capable of dealing with multiple classes in the data. For homogeneously distributed data, the decision trees produced by our method are identical to decision trees generated using Fisher's linear discriminant function with centrally stored data. For heterogeneously distributed data, a certain approximation is involved with a small change in performance with respect to the tree generated with centrally stored data. Experimental results for several well-known datasets are presented and compared with decision trees generated using Fisher's linear discriminant function with centrally stored data.
论文关键词:Distributed data mining,Decision tree,Fisher linear discriminant
论文评审过程:Received 4 April 2008, Revised 13 January 2009, Accepted 30 January 2009, Available online 12 February 2009.
论文官网地址:https://doi.org/10.1016/j.patcog.2009.01.033