MDLText: An efficient and lightweight text classifier

作者:

Highlights:

• A novel multinomial text classification method based on the minimum description length principle is proposed.

• The proposed approach is efficient, lightweight, scalable, multiclass, and sufficiently robust to prevent overfitting.

• Experiments were performed using forty-five text corpora, in batch learning and online learning learning contexts.

• The results indicate that our proposed approach outperformed the most-known benchmark text classification techniques.

摘要

•A novel multinomial text classification method based on the minimum description length principle is proposed.•The proposed approach is efficient, lightweight, scalable, multiclass, and sufficiently robust to prevent overfitting.•Experiments were performed using forty-five text corpora, in batch learning and online learning learning contexts.•The results indicate that our proposed approach outperformed the most-known benchmark text classification techniques.

论文关键词:Text categorization,Minimum description length,Classification,Machine learning,Natural language processing

论文评审过程:Received 28 June 2016, Revised 15 November 2016, Accepted 25 November 2016, Available online 25 November 2016, Version of Record 12 January 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.11.018