Selection of the most relevant terms based on a max-min ratio metric for text classification

作者:

Highlights:

• We Illustrated weaknesses of balanced accuracy and normalized difference measures.

• We proposed a new feature ranking metric called max-min ratio (MMR).

• MMR better estimates the true worth of a term in high class skews.

• We tested MMR against 8 well-known metrics on 6 datasets with 2 classifiers.

• MMR statistically outperforms metrics in 76% macro F1 cases and 74% micro F1 cases.

摘要

•We Illustrated weaknesses of balanced accuracy and normalized difference measures.•We proposed a new feature ranking metric called max-min ratio (MMR).•MMR better estimates the true worth of a term in high class skews.•We tested MMR against 8 well-known metrics on 6 datasets with 2 classifiers.•MMR statistically outperforms metrics in 76% macro F1 cases and 74% micro F1 cases.

论文关键词:Text classification,Feature selection,Feature ranking metrics

论文评审过程:Received 8 August 2017, Revised 11 July 2018, Accepted 12 July 2018, Available online 19 July 2018, Version of Record 26 July 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.07.028