Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization

作者：

Highlights：

•

摘要

With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being disseminated are urgently required. Detecting rumors in Arabic language social networks has lagged behind the work on other languages, particularly in English. In this paper, we address the problem of detecting rumors in Arabic tweets. We used a set of features extracted from the user and the content. These features were analyzed to determine their significance. Semi-supervised expectation–maximization (E–M) was used to train the proposed system with topics of newsworthy tweets. A comparison with supervised Gaussian Naïve Bayes (NB) showed that our semi-supervised system, using a small base of labeled data, outperforms Gaussian NB achieving an accuracy of 78.6%. The performance of the unsupervised E–M depends on the initial values, and we achieved an F1 score of 80% in one of our experiments.

论文关键词：Rumor detection,Arabic,Semi-supervised,Unsupervised,Expectation–maximization,Twitter

论文评审过程：Received 16 March 2019, Revised 10 August 2019, Accepted 12 August 2019, Available online 13 August 2019, Version of Record 25 October 2019.

论文官网地址：https://doi.org/10.1016/j.knosys.2019.104945