SDRS: A new lossless dimensionality reduction for text corpora
作者:
Highlights:
• Need of migrating from token-based representations to synset-based ones to achieve better performance on spam filtering.
• Review of current synset-based feature reduction schemes and representations.
• Introducing SDRS feature reduction process based on the usage of NSGA-II algoritm and semantic taxonomic relations between tokens.
• Design and execute a experimental protocol to test the suitability of SDRS dimensionality reduction method.
摘要
•Need of migrating from token-based representations to synset-based ones to achieve better performance on spam filtering.•Review of current synset-based feature reduction schemes and representations.•Introducing SDRS feature reduction process based on the usage of NSGA-II algoritm and semantic taxonomic relations between tokens.•Design and execute a experimental protocol to test the suitability of SDRS dimensionality reduction method.
论文关键词:Spam filtering,Token-based representation,Synset-based representation,Semantic-based feature reduction,Multi-objective evolutionary algorithms
论文评审过程:Received 17 December 2019, Revised 10 February 2020, Accepted 14 March 2020, Available online 21 March 2020, Version of Record 21 March 2020.
论文官网地址:https://doi.org/10.1016/j.ipm.2020.102249