A clustering based approach for skyline diversity

作者:

Highlights:

摘要

Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this “huge” skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K “representative” and “diverse” skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.

论文关键词:K-means clustering,Skyline query,Diversity,Circinal index,Performance evaluation

论文评审过程:Available online 30 December 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.12.104