Rough clustering of sequential data

作者:

Highlights:

摘要

This paper presents a new indiscernibility-based rough agglomerative hierarchical clustering algorithm for sequential data. In this approach, the indiscernibility relation has been extended to a tolerance relation with the transitivity property being relaxed. Initial clusters are formed using a similarity upper approximation. Subsequent clusters are formed using the concept of constrained-similarity upper approximation wherein a condition of relative similarity is used as a merging criterion. We report results of experimentation on msnbc web navigation dataset that are intrinsically sequential in nature. We have compared the results of the proposed approach with that of the traditional hierarchical clustering algorithm using vector coding of sequences. The results establish the viability of the proposed approach. The rough clusters resulting from the proposed algorithm provide interpretations of different navigation orientations of users present in the sessions without having to fit each object into only one group. Such descriptions can help web miners to identify potential and meaningful groups of users.

论文关键词:Clustering,Rough sets,Constrained-similarity upper approximation,Web mining,Similarity metric,Sequential data

论文评审过程:Received 7 February 2006, Revised 10 October 2006, Accepted 19 January 2007, Available online 20 February 2007.

论文官网地址:https://doi.org/10.1016/j.datak.2007.01.003