k-CEVCLUS: Constrained evidential clustering of large dissimilarity data

作者:

Highlights:

摘要

In evidential clustering, cluster-membership uncertainty is represented by Dempster–Shafer mass functions. The EVCLUS algorithm is an evidential clustering procedure for dissimilarity data, based on the assumption that similar objects should be assigned mass functions with low degree of conflict. CEVCLUS is a version of EVCLUS allowing one to use prior information on cluster membership, in the form of pairwise must-link and cannot-link constraints. The original CEVCLUS algorithm was shown to have very good performances, but it was quite slow and limited to small datasets. In this paper, we introduce a much faster and efficient version of CEVCLUS, called k-CEVCLUS, which is both several orders of magnitude faster than EVCLUS and has storage and computational complexity linear in the number of objects, making it applicable to large datasets (around 104 objects). We also propose a new constraint expansion strategy, yielding drastic improvements in clustering results when only a few constraints are given.

论文关键词:Evidence theory,Dempster–Shafer theory,Belief functions,Relational data,Credal partition,Constrained clustering,Instance-level constraints

论文评审过程:Received 2 August 2017, Revised 21 November 2017, Accepted 22 November 2017, Available online 22 November 2017, Version of Record 17 January 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.11.023