AdaSL: An Unsupervised Domain Adaptation framework for Arabic multi-dialectal Sequence Labeling

作者:

Highlights:

• We present AdaSL, an unsupervised framework for dialectal Arabic sequence labeling.

• We introduce a sub-word pooling aggregator for the full representation of words.

• We apply AdaSL for multilingual and Arabic Transformer pre-trained language models.

• We validate AdaSL on Named Entity Recognition (NER) and Part-of-Speech (POS) tagging.

• We achieve new SOTA zero-shot performances for dialectal Arabic sequence labeling.

摘要

•We present AdaSL, an unsupervised framework for dialectal Arabic sequence labeling.•We introduce a sub-word pooling aggregator for the full representation of words.•We apply AdaSL for multilingual and Arabic Transformer pre-trained language models.•We validate AdaSL on Named Entity Recognition (NER) and Part-of-Speech (POS) tagging.•We achieve new SOTA zero-shot performances for dialectal Arabic sequence labeling.

论文关键词:Dialectal Arabic,Arabic natural language processing,Domain adaptation,Multi-dialectal sequence labeling,Named entity recognition,Part-of-speech tagging,Zero-shot transfer learning

论文评审过程:Received 15 December 2021, Revised 24 April 2022, Accepted 25 April 2022, Available online 6 May 2022, Version of Record 6 May 2022.

论文官网地址:https://doi.org/10.1016/j.ipm.2022.102964