Integrating Character-level and Word-level Representation for Affect in Arabic Tweets

作者:

Highlights:

摘要

Affect tasks, which range from sentiment polarity classification to finer grained sentiment strength and emotional intensity detection, have become of increasing interest due to the vast amount of user-generated content and advanced learning models. Word representation models have been leveraged effectively within a variety of natural language processing tasks. However, these models are not always effective in the context of social media. When dealing with social media posts in Arabic, the use of Arabic dialects needs to be considered. Although using informal text to train word-level models can lead to the identification of words that convey the same meaning, these models are unable to capture the full extent of the words that are used in the real world due to out-of-vocabulary (OOV) words. The inability to identify such words is one of the main limitations of word-level models. One approach of overcoming OOV is through the use of character-level embeddings as they can effectively learn the vectors of word parts or character n-grams. This study uses a combination of character-level and word-level models to identify the most effective methods by which affective Arabic words in tweets can be represented semantically and morphologically. We evaluate our generated models and the proposed method by integrating them in a supervised learning framework that was used for a range of affect tasks and other related tasks. Our findings reveal that the developed models surpassed the performance of state-of-the-art Arabic pre-trained word embeddings over eight datasets. In addition, our models enhance previous state-of-the-art outcomes on tasks involving Arabic emotion intensity, outperforming the top-systems that used advanced ensemble learning models and several additional features.

论文关键词:Word-level embeddings,Character-level embeddings,Arabic tweets,Affect tasks

论文评审过程:Received 21 March 2021, Accepted 27 December 2021, Available online 6 January 2022, Version of Record 1 February 2022.

论文官网地址:https://doi.org/10.1016/j.datak.2021.101973