Learning weighted distance metric from group level information and its parallel implementation

作者：Hamidreza Mohebbi, Yang Mu, Wei Ding

摘要

The performance of many machine learning algorithms heavily relies on the distance metrics. Usually a distance metric is learned from a training set, while other valuable information, such as group structure, is not. Samples within a short distance form a group, which may contain several classes; each sample may have partial memberships to multiple groups. The group structure exists in both training and test sets. Additionally, outliers have negative effects on a distance metric. Increasing the number of noisy samples during the learning phase may increase the negative effects of outliers. Use of weights is one way to alleviate this problem when more similar samples are given more weight. This paper introduces a learning technique for weighted-distance metric. This semi-supervised method learns labeled information from training set and identifies groups among the samples from test set to form a metric space. In the experiments, the nearest neighbors algorithm is used as a classifier. The proposed weighted-distance metric improves the classification accuracy by more than 10 %. Furthermore, parallel computing with optimized CPU and GPU code is developed to speed up the computing time. Two parallel implementations with Matlab and CUDA are compared in this research. Parallel code that uses both CPU and the GPU achieves more than 3.7 times speedup compared to the traditional CPU code in the experiments.

论文关键词：Distance learning, Semi-supervised learning, Parallel computing, GPU accelaration

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-016-0826-7