Discovering pattern-based subspace clusters by pattern tree

作者:

Highlights:

摘要

Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree – a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness.

论文关键词:Clustering analysis,Subspace clustering,Pattern similarity,Pattern tree

论文评审过程:Received 9 November 2007, Accepted 21 February 2009, Available online 3 March 2009.

论文官网地址:https://doi.org/10.1016/j.knosys.2009.02.011