PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder

作者：

Highlights：

• Exploit URL patterns based on the proposed N-gram model to extract identity keywords

• Attain robustness in detecting phishing webpages hosted in any language

• Offers long-term effectiveness by leveraging on permanent phishing characteristic

• Achieve higher accuracy in finding target identity by using compromise programming

• Suppress false positives by exploiting indirect identity relationships

摘要

This paper proposes a phishing detection technique based on the difference between the target and actual identities of a webpage. The proposed phishing detection approach, called PhishWHO, can be divided into three phases. The first phase extracts identity keywords from the textual contents of the website, where a novel weighted URL tokens system based on the N-gram model is proposed. The second phase finds the target domain name by using a search engine, and the target domain name is selected based on identity-relevant features. In the final phase, a 3-tier identity matching system is proposed to determine the legitimacy of the query webpage. The overall experimental results suggest that the proposed system outperforms the conventional phishing detection methods considered.

论文关键词：Phishing detection,Identity keywords,N-gram,Weighted URL tokens,Search engine

论文评审过程：Received 20 October 2015, Revised 28 March 2016, Accepted 22 May 2016, Available online 1 June 2016, Version of Record 1 July 2016.

论文官网地址：https://doi.org/10.1016/j.dss.2016.05.005