Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition

作者:Yu Kong, Yun Fu

摘要

We propose a novel approach, max-margin heterogeneous information machine (MMHIM), for human action recognition from RGB-D videos. MMHIM fuses heterogeneous RGB visual features and depth features, and learns effective action classifiers using the fused features. Rich heterogeneous visual and depth data are effectively compressed and projected to a learned shared space and independent private spaces, in order to reduce noise and capture useful information for recognition. Knowledge from various sources can then be shared with others in the learned space to learn cross-modal features. This guides the discovery of valuable information for recognition. To capture complex spatiotemporal structural relationships in visual and depth features, we represent both RGB and depth data in a matrix form. We formulate the recognition task as a low-rank bilinear model composed of row and column parameter matrices. The rank of the model parameter is minimized to build a low-rank classifier, which is beneficial for improving the generalization power. We also extend MMHIM to a structured prediction model that is capable of making structured outputs. Extensive experiments on a new RGB-D action dataset and two other public RGB-D action datasets show that our approaches achieve state-of-the-art results. Promising results are also shown if RGB or depth data are missing in training or testing procedure.

论文关键词:Action recognition, RGB-D videos, Heterogeneous data, Feature learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-016-0982-6