Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

作者：Nitin J. Janwe, Kishor K. Bhoyar

摘要

Describing visual contents in videos by semantic concepts is an effective and realistic approach that can be used in video applications such as annotation, indexing, retrieval and ranking. In these applications, video data needs to be labelled with some known set of labels or concepts. Assigning semantic concepts manually is not feasible due to the large volume of ever-growing video data. Hence, automatic semantic concept detection of videos is a hot research area. Recently Deep Convolutional Neural Networks (CNNs) used in computer vision tasks are showing remarkable performance. In this paper, we present a novel approach for automatic semantic video concept detection using deep CNN and foreground driven concept co-occurrence matrix (FDCCM) which keeps foreground to background concept co-occurrence values, built by exploiting concept co-occurrence relationship in pre-labelled TRECVID video dataset and from a collection of random images extracted from Google Images. To deal with the dataset imbalance problem, we have extended this approach by making a fusion of two asymmetrically trained deep CNNs and used FDCCM to further improve concept detection. The performance of the proposed approach is compared with state-of-the-art approaches for the video concept detection over the widely used TRECVID data set and is found to be superior to existing approaches.

论文关键词：Semantic video concept detection, Foreground driven concept co-occurrence matrix, Convolutional neural network, Deep learning, Multi-label classification, Asymmetric training

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-017-1033-x