Arabic machine reading comprehension on the Holy Qur’an using CL-AraBERT
作者:
Highlights:
• We developed the first reading comprehension system on Qur’an.
• QRCD dataset is introduced as the first Qur’anic Reading Comprehension Dataset.
• CL-AraBERT is developed as a pre-trained model over a Classical Arabic Dataset.
• Cross-lingual transfer learning from MSA to Classical Arabic is leveraged.
• A new rank-based measure is proposed to integrate partial matching.
摘要
•We developed the first reading comprehension system on Qur’an.•QRCD dataset is introduced as the first Qur’anic Reading Comprehension Dataset.•CL-AraBERT is developed as a pre-trained model over a Classical Arabic Dataset.•Cross-lingual transfer learning from MSA to Classical Arabic is leveraged.•A new rank-based measure is proposed to integrate partial matching.
论文关键词:Classical Arabic,Reading comprehension,Answer extraction,Partial matching evaluation,Pre-trained language models,Cross-lingual transfer learning
论文评审过程:Received 26 February 2022, Revised 1 August 2022, Accepted 18 August 2022, Available online 9 September 2022, Version of Record 9 September 2022.
论文官网地址:https://doi.org/10.1016/j.ipm.2022.103068