A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts

作者:

Highlights:

• We present strategy to augment existing paraphrase and non-paraphrase annotations in sound manner for deep learning models.

• We develop a novel multi-cascaded learning model for robust paraphrase detection in both clean and noisy texts.

• We address both clean and noisy texts in our presentation and show current best performances on benchmark datasets.

• We study the impact of different components of our multi-cascaded model on paraphrase detection performance.

• We study the impact of various data augmentation steps on paraphrase detection performance.

摘要

•We present strategy to augment existing paraphrase and non-paraphrase annotations in sound manner for deep learning models.•We develop a novel multi-cascaded learning model for robust paraphrase detection in both clean and noisy texts.•We address both clean and noisy texts in our presentation and show current best performances on benchmark datasets.•We study the impact of different components of our multi-cascaded model on paraphrase detection performance.•We study the impact of various data augmentation steps on paraphrase detection performance.

论文关键词:Paraphrase detection,Deep learning,Data augmentation,Sentence similarity

论文评审过程:Received 23 July 2019, Revised 2 January 2020, Accepted 8 January 2020, Available online 15 January 2020, Version of Record 15 January 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102204