TCURL: Exploring hybrid transformer and convolutional neural network on phishing URL detection

作者：

Highlights：

•

摘要

Phishing is a growing threat that involves cybercriminals creating counterfeit websites to lure victims and obtain their sensitive information, such as login credentials and credit card numbers. According to the Q4 2021 Phishing Trends Report by the Anti-Phishing Working Group, the number of phishing attacks has tripled from early 2020. Conventional blacklist method cannot protect users from attacks using new phishing URLs. Traditional machine learning methods require complex feature engineering and generally cannot meet the detection accuracy requirements. Deep learning methods based on fully convolutional networks and pure transformers only pay attention to local correlations or long-term dependencies. To address these issues, we propose a hybrid network architecture, called TCURL, which considers both local and global correlations among the characters of URLs. TCURL has two parallel branches, a convolution branch and a transformer branch, and a fusion block used to deal with messages from the two branches. The convolution branch provides sufficient positional information meaning that no extra positional encoding is needed. Through experiments, we explored various design choices to optimize the model. The proposed method achieves an accuracy of 96.92%, 99.77%, and 89.73% on three sampled datasets, which further outperforms other existing methods.

论文关键词：Phishing,URL detection,Convolutional neural network,Transformer,Multi-head attention

论文评审过程：Received 12 April 2022, Revised 24 September 2022, Accepted 25 September 2022, Available online 29 September 2022, Version of Record 21 October 2022.

论文官网地址：https://doi.org/10.1016/j.knosys.2022.109955