Feature selection based on term frequency deviation rate for text classification
作者:Hongfang Zhou, Yiming Ma, Xiang Li
摘要
Feature selection is a technique to select a subset of the most relevant features for modeling training. In this paper, a new concept of TDR is firstly proposed to improve the classification accuracy. Then, a TDR-based algorithm for text classification is advanced. Finally, the extensive experiments are made on seven datasets (K1a, K1b, WAP, R52, R8, 20NewGroups, and Cade12) for two classifiers of Naive Bayes and Support Vector Machine. The experimental results indicate that the new approach can improve the classification accuracy by an average percent of 7.9%.
论文关键词:Text classification, Feature selection, Term frequency, Document frequency, Deviation ratio
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-020-01937-4