Similarities between Arabic dialects: Investigating geographical proximity

作者:

Highlights:

摘要

The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines dialects based on increasingly limited geographic areas like cities and provinces. This paper focuses on a related, yet relatively unexplored topic: the effects of the geographical proximity of cities located in Arab countries on their dialectal similarity. Our work is twofold, reliant on: (1) comparing the textual similarities between dialects using cosine similarity and (2) measuring the geographical distance between locations. We study MADAR and NADI, two established datasets with Arabic dialects from many cities and provinces. Our results indicate that cities located in different countries may in fact have more dialectal similarity than cities within the same country, depending on their geographical proximity. The correlation between dialectal similarity and city proximity suggests that cities that are closer together are more likely to share dialectal attributes, regardless of country borders. This nuance provides the potential for important advancements in Arabic dialect research because it indicates that a more granular approach to dialect classification is essential to understanding how to frame the problem of Arabic dialect identification.

论文关键词:Arabic natural language processing,Arabic dialects,Geolocation,Textual similarity

论文评审过程:Received 28 April 2021, Revised 17 September 2021, Accepted 18 September 2021, Available online 28 September 2021, Version of Record 28 September 2021.

论文官网地址:https://doi.org/10.1016/j.ipm.2021.102770