Streaming association rule (SAR) mining with a weighted order-dependent representation of Web navigation patterns

作者:

Highlights:

摘要

This paper considers a problem of finding predictive and useful association rules with a new Web mining algorithm, a streaming association rule (SAR) model. We first adopt a weighted order-dependent scheme (assigning more weights for early visited pages) rather than taking a traditional Boolean scheme (assigning 1 for visited and 0 for non-visited pages). This way, we intend to improve the limited representation of navigation patterns in previous association rule mining (ARM) algorithms. We also note that most traditional association rule models are not scalable because they require multiple scans of all records to re-calibrate a predictive model when there are new updates in original databases. The proposed SAR model takes a “divide-and-conquer” approach and requires only single scan of data sets to avoid the curse of dimensionality. Through comparative experiments on a real-world data set, we show that prediction models based on a weighted order-dependent representation are more accurate in predicting the next moves of Web navigators than models based on a Boolean representation. In particular, when combined with several heuristics developed to eliminate redundant association rules, SAR models show a very comparable prediction accuracy while maintaining a small fraction of association rules compared to traditional ARM models. Finally, we quantify and graphically show the significance or contribution of each pages to forming unique rule sets in each database segments.

论文关键词:Web mining,Association rule,Streaming association rule,Weighted order-dependent representation

论文评审过程:Available online 12 November 2008.

论文官网地址:https://doi.org/10.1016/j.eswa.2008.10.068