Riemannian manifold-valued part-based features and geodesic-induced kernel machine for activity classification dedicated to assisted living

摘要

In this paper, we address the problem of classifying human activities that are typical in a daily living environment from videos. We propose a novel method based on Riemannian manifolds that uses a tree structure of two layers, where nodes in each tree branch are on a Riemannian manifold. Each node corresponds to different part-based covariance features, and induces a geodesic-based kernel machine for classification. In the first layer, activities are classified according to the dynamics of body pose and the movement of hands or arms. Activities with similar body pose and motion but different human-object interaction are coarsely classified into the same category. In the second layer, the coarsely classified activities are further fine classified, according to the appearance of local image patches at hands in key frames. This is based on the observation that interacting objects as discriminative cues are likely to be attached to hands. The main novelties of this paper include: (i) Motion of body parts for each video activity is characterized by global features. More specifically, the features are distances between each pair of key points and the orientations of lines that connect them; (ii) Human-object interaction is described by local features. That is, the appearance of local regions around hands in key frames, where key frames are selected using the proximity of hands to other key points; (iii) Classification of human activities is formulated by a geodesic distance-induced kernel machine. This is done by exploiting pairwise geodesics on Riemannian manifolds under the log-Euclidean metric. Experiments were conducted on 2 video datasets. The first dataset, made on our university campus, contains 8 activities with a total number of 943 videos. The second dataset is from a publicly available dataset, containing 7 activity classes and a total of 224 videos. Our test results on the first video dataset have shown high classification accuracy (average 94.27%), and small false alarm rate (average 0.80%). For the second video dataset, test results from the proposed method are compared with 6 existing methods. The proposed method has outperformed all these existing methods. Discussions are given on the impact of detected skeleton points from Kinect on the performance of activity classification.