Using the feature projection technique based on a normalized voting method for text classification

作者:

Highlights:

摘要

This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by a majority voting from the individual classifications of each feature. Our empirical results show that the proposed approach, text categorization using feature projections (TCFP), outperforms k-NN, Rocchio, and Naive Bayes. Most of all, TCFP is a faster classifier, up to one hundred times faster than k-NN in the Newsgroups data set. It is also robust from noisy data. Since the TCFP algorithm is very simple, its implementation and training process can be done very easily. For these reasons, TCFP can be a useful classifier in text categorization tasks, which need fast execution speed, robustness, and high performance.

论文关键词:Text categorization,Text classifier,Feature projections,Instance-based learning,k-NN

论文评审过程:Received 20 August 2002, Accepted 14 April 2003, Available online 11 June 2003.

论文官网地址:https://doi.org/10.1016/S0306-4573(03)00029-3