Combining heterogeneous classifiers for relational databases

作者:

Highlights:

摘要

Practical usage of machine learning is gaining strategic importance in enterprises looking for business intelligence. However, most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a flat form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.

论文关键词:Heterogeneous classifier,RDF,Relational data,RDBMS

论文评审过程:Received 29 January 2012, Revised 19 May 2012, Accepted 24 June 2012, Available online 3 July 2012.

论文官网地址:https://doi.org/10.1016/j.patcog.2012.06.015