Single pass text classification by direct feature weighting
作者:Hassan H. Malik, Dmitriy Fradkin, Fabian Moerchen
摘要
The Feature Weighting Classifier (FWC) is an efficient multi-class classification algorithm for text data that uses Information Gain to directly estimate per-class feature weights in the classifier. This classifier requires only a single pass over the dataset to compute the feature frequencies per class, is easy to implement, and has memory usage that is linear in the number of features. Results of experiments performed on 128 binary and multi-class text and web datasets show that FWC’s performance is at least comparable to, and often better than that of Naive Bayes, TWCNB, Winnow, Balanced Winnow and linear SVM. On a large-scale web dataset with 12,294 classes and 135,973 training instances, FWC trained in 13 s and yielded comparable classification performance to a state of the art multi-class SVM implementation, which took over 15 min to train.
论文关键词:Text classification, Feature weighting, Linear classifiers, Information gain, Scalable learning
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-010-0317-9