Porn2Vec: A robust framework for detecting pornographic websites based on contrastive learning

作者:

Highlights:

摘要

Pornographic websites have become one of the largest origins spreading vulgar contents, which seriously threaten the mental and physical health of juveniles. Unfortunately, the existing pornography detection approaches are ineffective against the pornographic websites, which are armed with adversarial attack examples. In this paper, we propose Porn2Vec, a robust end-to-end framework for detecting pornographic websites using contrastive learning. Particularly, we first model pornographic websites with a heterogeneous graph consisting of websites, webpages, images, texts, and their interactive relationships, and formalize pornographic website detection into node classification task on the graph. Subsequently, we present a novel contrastive learning based heterogeneous graph embedding method to learn the high-level representation of websites by jointly aggregating image-based, text-based, and structure-based features. Finally, the learned website features are fed into a neural network to train an automatic model for pornographic website detection. Experimental results show that Porn2Vec outperforms the existing state-of-the-art methods, demonstrating a more promising and robust performance for detecting well-disguised pornographic websites equipped with adversarial attack examples.

论文关键词:Pornography detection,Heterogeneous graph,Contrastive Learning,Robustness

论文评审过程:Received 5 January 2021, Revised 6 July 2021, Accepted 8 July 2021, Available online 13 July 2021, Version of Record 16 July 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107296