The role of significance tests in consistent interpretation of nested partitions

作者:

Highlights:

摘要

Cluster interpretation is an important step for a proper understanding of a set of classes, independently of whether they have been automatically discovered or expert-based. An understanding of classes is crucial for the further use of classes as the basis of a decision-making process.The abundant work on cluster validity found in the literature is mainly focused on the validation of clusters from the structural point of view. However, structural validation does not ensure that the clustering is useful, since meaningfulness is the key to guaranteeing that classes can support further decisions. In previous works, special significance tests taken from the field of multivariate analysis were introduced in an interpretation methodology for automatically assessing relevant variables in particular classes.In this paper, we present the interpretation of nested partitions and the relationships between both interpretations are studied. In particular, the inconsistencies produced in interpretation when a second partition refines the first one with a higher level of granularity are studied, diagnosed, and a modification of the original methodology is provided to guarantee consistency in these cases. The relevant characteristics detected in a parent class must also be inherited in subclasses, or at least in some of them.The proposal is evaluated using a real data set on baseline health conditions and dietary habits of a sample of the general population.

论文关键词:Clustering,Nested partitions,Statistical tests,Sensitivity of a test,Cluster interpretation,Consistency

论文评审过程:Received 22 October 2014, Available online 10 February 2015, Version of Record 2 September 2015.

论文官网地址:https://doi.org/10.1016/j.cam.2015.01.031