ROLEX-SP: Rules of lexical syntactic patterns for free text categorization

作者:

Highlights:

摘要

Due to the rapid growth of free text documents available in digital form, efficient techniques of automatic categorization are of great importance. In this paper, we present an efficient rule-based method for categorizing free text documents. The contributions of this research are the formation of lexical syntactic patterns as basic classification features, a categorization framework that addresses the problem of classifying free text with minimal label description, and an efficient learning algorithm in terms of time complexity and F-measure. The framework of ROLEX-SP concentrates on capturing the correct classes of text as well as reducing classification errors.We performed experiments in order to evaluate the proposed method and compare our work with state-of-the-art methods in domain specific source of knowledge. The results indicate that ROLEX-SP outperforms other methods in terms of standard F-measure in medical domain because of the strong definition of MeSH description of medical categories.

论文关键词:Rule-base categorization,Lexical syntactic patterns,Induction rules,Multi-class classification,Feature imbalance

论文评审过程:Received 21 March 2010, Revised 13 July 2010, Accepted 14 July 2010, Available online 18 July 2010.

论文官网地址:https://doi.org/10.1016/j.knosys.2010.07.005