Towards Reversal-Invariant Image Representation

作者:Lingxi Xie, Jingdong Wang, Weiyao Lin, Bo Zhang, Qi Tian

摘要

State-of-the-art image classification approaches are mainly based on robust image representation, such as the bag-of-features (BoF) model or the convolutional neural network (CNN) architecture. In real applications, the orientation (left/right) of an image or an object might vary from sample to sample, whereas some handcrafted descriptors (e.g., SIFT) and network operations (e.g., convolution) are not reversal-invariant, leading to the unsatisfied stability of image features extracted from these models. To deal with, a popular solution is to augment the dataset by adding a left-right reversed copy for each image. This strategy improves the recognition accuracy to some extent, but also brings the price of almost doubled time and memory consumptions on both the training and testing stages. In this paper, we present an alternative solution based on designing reversal-invariant representation of local patterns, so that we can obtain the identical representation for an image and its left-right reversed copy. For the BoF model, we design a reversal-invariant version of SIFT descriptor named Max-SIFT, a generalized RIDE algorithm which can be applied to a large family of local descriptors. For the CNN architecture, we present a simple idea of generating reversal-invariant deep features (RI-Deep), and, inspired by which, design reversal-invariant convolution (RI-Conv) layers to increase the CNN capacity without increasing the model complexity. Experiments reveal consistent accuracy gain on various image classification tasks, including scene understanding, fine-grained object recognition, and large-scale visual recognition.

论文关键词:Image classification, BoF, CNN, Reversal-invariant image representation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-016-0970-x