GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

作者:

摘要

While there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called “family resemblance” metaphor: “the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other”. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-N{¬,∨}, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-N{¬,∨} hypothesis space has two partial orders that form complete lattices.

论文关键词:

论文评审过程:Received 16 November 2011, Revised 4 July 2012, Accepted 11 July 2012, Available online 20 July 2012.

论文官网地址:https://doi.org/10.1016/j.artint.2012.07.003