Bengali text document categorization based on very deep convolution neural network

作者:

Highlights:

• Illustrated the development of benchmark text corpus for the low-resource languages.

• Presented an algorithm for optimisation of hyperparameters of embedding models.

• Evaluated several embedding models using semantic and syntactic similarity measures.

• Integrated embedding and very deep learning models to improve text classification.

• Evaluated the proposed and existing models on built corpus for text classification.

摘要

•Illustrated the development of benchmark text corpus for the low-resource languages.•Presented an algorithm for optimisation of hyperparameters of embedding models.•Evaluated several embedding models using semantic and syntactic similarity measures.•Integrated embedding and very deep learning models to improve text classification.•Evaluated the proposed and existing models on built corpus for text classification.

论文关键词:Intelligent systems,Natural language processing,Low resource language,Semantic feature extraction,Document categorization,Deep convolution network

论文评审过程:Received 16 December 2020, Revised 29 April 2021, Accepted 8 June 2021, Available online 2 July 2021, Version of Record 7 July 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115394