T-copula and Wasserstein distance-based stochastic neighbor embedding

作者:

Highlights:

摘要

The aim of dimensionality reduction is to obtain the faithful low-dimensional representations of high-dimensional data by preserving the data quality. It is beneficial to better visualize the high-dimensional data and improve the classification or clustering performance. Many dimensionality reduction methods based on the framework of stochastic neighbor embedding have been developed. However, most of them use the Euclidean distance to describe the dissimilarity of data points in high-dimensional space, which is not suitable for high-dimensional data with non-linear manifold structure. In addition, they usually use the family of normal distributions as their embedding distributions in low-dimensional space. This will incur that they are only suitable to deal with the spherical data. In order to deal with these issues, we present a novel dimensionality reduction method by integrating the Wasserstein distance and t-copula function into the stochastic neighbor embedding model. We first employ the Gaussian distribution equipped with the Wasserstein distance to describe the pairwise similarity in the high-dimensional space. Then, the t-copula function is used to generate a general heavy-tailed distribution for the description of low-dimensional pairwise similarity, which can process different shapes of data and avoid the crowding problem. Furthermore, Kullback–Leibler divergence is employed to measure the difference between the high-dimensional and low-dimensional similarities. Finally, a gradient descent algorithm with adaptive moment estimation is developed to solve the proposed objective function. Extensive experiments are conducted on eight real-world datasets to demonstrate the effectiveness of the proposed method in terms of the dimensional reduction quality, classification and clustering evaluation metrics.

论文关键词:Dimensionality reduction,Stochastic neighbor embedding,Copula function,Wasserstein distance

论文评审过程:Received 14 September 2021, Revised 21 January 2022, Accepted 9 February 2022, Available online 18 February 2022, Version of Record 2 March 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108431