Cluster validation using an ensemble of supervised classifiers

作者:

Highlights:

摘要

A cluster validity index is used to select which clustering algorithm to apply for a given problem. It works by evaluating the quality of a partition, as output by a candidate clustering algorithm, getting around the common case of the lack of an expert in the given domain of discourse. Most existing validity indexes make assumptions, such as each cluster of the partition having an underlying structure, for example, a hypersphere, yielding incorrect evaluations when they do not hold. Here, we propose a new cluster validity index, which attempts to avoid this bias using an ensemble of distinct supervised classifiers; this way the bias is not attributable to a specific classifier, but to a collection thereof, hence alleviating the problem. The rationale behind our index is that a good partition should induce the construction of also a good classifier; the better the classification performance, the better the quality of the partition under evaluation. Notice how we use the partition to be assessed as a sort of labeled dataset, where each object is labeled with the cluster label it belongs to. We have tested our index on 50 numerical datasets, grouped using six different clustering algorithms. In our experiments, our index outperforms five validity indexes, including the most popular ones.

论文关键词:Clustering,Cluster validity indexes,Classifier ensembles

论文评审过程:Received 13 July 2017, Revised 9 December 2017, Accepted 4 January 2018, Available online 6 January 2018, Version of Record 20 February 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.01.010