Contextual feature selection for text classification

作者：

Highlights：

•

摘要

We present a simple approach for the classification of “noisy” documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant.

论文关键词：Classification,Named entities,Feature selection,Text filtering

论文评审过程：Received 28 May 2006, Accepted 25 July 2006, Available online 24 October 2006.

论文官网地址：https://doi.org/10.1016/j.ipm.2006.07.006