Regularizing conjunctive features for classification

作者:

Highlights:

摘要

We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries.

论文关键词:Classification,Feature generation,Conjunctive queries,Separability,Generalized hypertree width

论文评审过程:Received 1 August 2019, Revised 17 August 2020, Accepted 24 January 2021, Available online 17 February 2021, Version of Record 19 February 2021.

论文官网地址:https://doi.org/10.1016/j.jcss.2021.01.003