Selection of the most relevant terms based on a max-min ratio metric for text classification
作者:
Highlights:
• We Illustrated weaknesses of balanced accuracy and normalized difference measures.
• We proposed a new feature ranking metric called max-min ratio (MMR).
• MMR better estimates the true worth of a term in high class skews.
• We tested MMR against 8 well-known metrics on 6 datasets with 2 classifiers.
• MMR statistically outperforms metrics in 76% macro F1 cases and 74% micro F1 cases.
摘要
•We Illustrated weaknesses of balanced accuracy and normalized difference measures.•We proposed a new feature ranking metric called max-min ratio (MMR).•MMR better estimates the true worth of a term in high class skews.•We tested MMR against 8 well-known metrics on 6 datasets with 2 classifiers.•MMR statistically outperforms metrics in 76% macro F1 cases and 74% micro F1 cases.
论文关键词:Text classification,Feature selection,Feature ranking metrics
论文评审过程:Received 8 August 2017, Revised 11 July 2018, Accepted 12 July 2018, Available online 19 July 2018, Version of Record 26 July 2018.
论文官网地址:https://doi.org/10.1016/j.eswa.2018.07.028