Improving classification models with taxonomy information

作者：

Highlights：

•

摘要

Classification is an established data mining problem that has largely been investigated by the research community. Since the raw data is commonly unsuitable for training a classifier as it is, several preprocessing steps are commonly integrated in the data mining and knowledge discovery process before applying classification.This paper investigates the usefulness of integrating taxonomy information into classifier construction. In particular, it presents a general-purpose strategy to improve structured data classification accuracy by enriching data with semantics-based knowledge provided by a taxonomy (i.e., a set of is-a hierarchies) built over data items. The proposed approach may be deemed particularly useful by experts who could directly access or easily infer meaningful taxonomy models over the analyzed data. To demonstrate the benefit obtained from utilizing taxonomies for contemporary classification methods, we also presented a generalized version of a state-of-the-art associative classifier, which also includes generalized (high level) rules in the classification model.Experiments show the effectiveness of the proposed approach in improving the accuracy of state-of-art classifiers, associative and not.

论文关键词：Data mining,Classification,Taxonomies,Generalized association rules

论文评审过程：Received 29 August 2011, Revised 17 January 2013, Accepted 17 January 2013, Available online 26 January 2013.

论文官网地址：https://doi.org/10.1016/j.datak.2013.01.005