A hierarchical semantic-based distance for nominal histogram comparison

作者:

Highlights:

摘要

We propose a new distance called Hierarchical Semantic-Based Distance (HSBD), devoted to the comparison of nominal histograms equipped with a dissimilarity matrix providing the semantic correlations between the bins. The computation of this distance is based on a hierarchical strategy, progressively merging the considered instances (and their bins) according to their semantic proximity. For each level of this hierarchy, a standard bin-to-bin distance is computed between the corresponding pair of histograms. In order to obtain the proposed distance, these bin-to-bin distances are then fused by taking into account the semantic coherency of their associated level. From this modus operandi, the proposed distance can handle histograms which are generally compared thanks to cross-bin distances. It preserves the advantages of such cross-bin distances (namely robustness to histogram translation and histogram bin size issues), while inheriting the low computational cost of bin-to-bin distances. Validations in the context of geographical data classification emphasize the relevance and usefulness of the proposed distance.

论文关键词:Histogram distance,Data representation,Nominal histogram,Semantic-based metric,Unsupervised classification,Information retrieval

论文评审过程:Received 15 February 2012, Revised 23 May 2013, Accepted 8 June 2013, Available online 18 June 2013.

论文官网地址:https://doi.org/10.1016/j.datak.2013.06.002