Session stitching using sequence fingerprinting for web page visits

作者:

Highlights:

• We present the first session stitching approach using sequence fingerprinting.

• Using behavioural constraints web visits can be profiled in detail.

• The approach requires less sensitive data than competing approaches.

• The approach can be used on any web log and is vendor-agnostic.

• Results are competitive with privacy-sensitive and embedding approaches.

摘要

The nature of people's web navigation has significantly changed in recent years. The advent of smartphones and other handheld devices has given rise to web users consulting websites with more than one device, or using a shared device. As a result, large volumes of seemingly disjoint data are available, which when analysed together can support decision-making. The task of identifying web sessions by linking such data back to a specific person, however, is hard. The idea of session stitching aims to overcome this by using machine learning inference to identify similar or identical users. Many such efforts use various demographic data or device-based features to train matching algorithms. However, often these variables are not available for every dataset or are recorded differently, making a streamlined setup difficult. Besides, they often result in vast feature spaces which are hard to use for actionable interpretation.

论文关键词:Session stitching,Web analytics,Sequence mining,Session fingerprinting

论文评审过程:Received 13 July 2020, Revised 22 April 2021, Accepted 22 April 2021, Available online 28 April 2021, Version of Record 24 September 2021.

论文官网地址:https://doi.org/10.1016/j.dss.2021.113579