An ensemble model for classifying idioms and literal texts using BERT and RoBERTa

作者：

Highlights：

• Fundamental NLP categorizes text into structured categories.

• We propose a predictive ensemble model to classify idioms and literals.

• We user BERT and RoBERTa, fine-tuned with the Trofi dataset.

• Model is tested with a newly created dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts.

摘要

•Fundamental NLP categorizes text into structured categories.•We propose a predictive ensemble model to classify idioms and literals.•We user BERT and RoBERTa, fine-tuned with the Trofi dataset.•Model is tested with a newly created dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts.

论文关键词：BERT,RoBERTa,Ensemble model,Idiom,Literal classification

论文评审过程：Received 19 April 2021, Revised 25 August 2021, Accepted 5 September 2021, Available online 26 September 2021, Version of Record 26 September 2021.

论文官网地址：https://doi.org/10.1016/j.ipm.2021.102756