Machine learning based phishing detection from URLs

作者:

Highlights:

• Use of 7 different classification algorithms and NLP based features.

• A Big URL Data Set is produced and shared (36,400 legitimate and 37,175 phishing).

• Real-time and language-independent classification algorithms.

• Feature-rich classifiers with Word Vectors, NLP-based and Hybrid features.

• The proposed approach reaches 97.98% accuracy rate.

摘要

•Use of 7 different classification algorithms and NLP based features.•A Big URL Data Set is produced and shared (36,400 legitimate and 37,175 phishing).•Real-time and language-independent classification algorithms.•Feature-rich classifiers with Word Vectors, NLP-based and Hybrid features.•The proposed approach reaches 97.98% accuracy rate.

论文关键词:Cyber security,Phishing attack,Machine learning,Classification algorithms,Cyber attack detection

论文评审过程:Received 7 May 2018, Revised 25 July 2018, Accepted 12 September 2018, Available online 18 September 2018, Version of Record 12 October 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.09.029