A feature selection method based on term frequency difference and positive weighting factor

作者:

Highlights:

摘要

Firstly, a new concept of term frequency difference factor is proposed to balance the influences of term frequency and document frequency on feature selection. Secondly, the idea of positive weighting factor is advanced to balance the roles of the document frequency in the positive and negampared with six popular algorithms on six datasets using two classifiers of Naive Bayes and Support tive categories. And finally, a new feature selection algorithm based on term frequency difference and positive weighting factor, PWTF-TCM, is presented based on the two above concepts. In the experiments, PWTF-TCM is coVector Machines. The experimental results show that PWTF-TCM outperforms by 75% for Macro-F1 and 58.33% for Micro-F1. In addition, PWTF-TCM improves the classification accuracy by 4.58% compared with Trigonometric comparison measure.

论文关键词:Text classification,Feature selection,Term frequency,Document frequency

论文评审过程:Received 16 December 2021, Revised 14 June 2022, Accepted 28 July 2022, Available online 4 August 2022, Version of Record 17 August 2022.

论文官网地址:https://doi.org/10.1016/j.datak.2022.102060