Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data

作者:

Highlights:

摘要

Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels.

论文关键词:Data clustering,Similarity-based clustering,High-dimensional data analysis,DNA microarray data analysis

论文评审过程:Received 25 May 2006, Revised 19 June 2006, Accepted 22 June 2006, Available online 22 August 2006.

论文官网地址:https://doi.org/10.1016/j.patcog.2006.06.021