Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach

作者:

Highlights:

摘要

Recent studies reported that about half of Web users nowadays are intelligent agents (Web bots). Many bots are impersonators operating at a very high sophistication level, trying to emulate navigational behaviors of legitimate users (humans). Moreover, bot technology continues to evolve which makes bot detection even harder. To deal with this problem, many advanced methods for differentiating bots from humans have been proposed, a large part of which relies on supervised machine learning techniques. In this paper, we propose a novel approach to identify various profiles of bots and humans which combines feature selection and unsupervised learning of HTTP-level traffic patterns to develop a user session classification model. Session clustering is performed with the agglomerative Information Bottleneck (aIB) algorithm, as well as with some other reference algorithms. The model is then used to classify new sessions to one of the profiles and to label the sessions as performed by bots or humans. An extensive experimental study, based on real server log data, demonstrates the ability of aIB clustering to distinguish user profiles and confirms high performance of the classification model in terms of accuracy, F1, recall, and precision.

论文关键词:Web user,Web bot,Internet robot,Bot detection,Machine learning,Unsupervised learning

论文评审过程:Received 12 September 2019, Revised 31 March 2020, Accepted 3 April 2020, Available online 11 April 2020, Version of Record 24 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105875