Clickbait detection on WeChat: A deep model integrating semantic and syntactic information

作者:

Highlights:

摘要

In online social media, there is a large amount of clickbait using various tricks such as curious words and well-designed sentence structures, to attract users to click on hyperlinks for unknown benefits. Clickbait detection aims to detect these hyperlinks through automated algorithms. Previous researches usually focus on the semantic information of the English clickbait corpus. In our paper, we construct a Chinese WeChat clickbait dataset, and propose an effective deep method, i.e., multiple features for WeChat clickbait detection (MFWCD), by integrating semantic, syntactic and auxiliary information. Based on the MFWCD framework, we propose two models with different parameter scales, namely MFWCD-BERT and MFWCD-BiLSTM, which respectively use Bidirectional Encoder Representation from Transformers (BERT) and lightweight Bidirectional Long Short-Term Memory (Bi-LSTM) network with attention mechanism to encode title semantics. In addition, we propose an improved Graph Attention Network (GAT) to aggregate local syntactic structures of titles and use attention mechanism to capture valuable structures. Finally, an auxiliary feature related to user reading behavior is introduced to obtain a richer title representation. Sufficient experiments prove the effectiveness and interpretability of our MFWCD for clickbait detection, and the performance is better than compared baseline methods.

论文关键词:Clickbait detection,Natural language processing,Graph attention network,Attention mechanism,Syntactic structure

论文评审过程:Received 8 September 2021, Revised 10 March 2022, Accepted 14 March 2022, Available online 25 March 2022, Version of Record 6 April 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108605