From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition

作者:

Highlights:

摘要

Fine-grained image recognition puts forward a special challenge due to the difficulties of distinguishing subtle inter-class differences and large intra-class variances. Existing weakly supervised approaches tend to capture the most discriminative regions, thereby guiding network to learn fine-grained features. However, current methods neglect the correlation between object and details, where object localization is conductive to part detection. In addition, they generally not only need heavy computational cost to find details with auxiliary subnet or selective strategy, but also require well-designed bounding boxes which are inflexible for different scale targets. In this paper, we propose a more lightweight framework to progressively sampling discriminative parts for learning details from coarse-scale to fine-scale, without any pre-designed bounding boxes. Our method first amplifies the object (e.g., bird, car) from the original image in the light of class visual patterns, then a self-adaptive region sampler applied to detect most informative regions from attention maps to learn fine-grained representations. The framework consists of three streams, i.e., the whole, the object and the detail respectively, thus hierarchical features can be preserved and learned. Furthermore, our approach can be trained end-to-end in a weakly supervised manner, and few computational costs are needed at inference phase. Comprehensive experiments and ablation studies demonstrate that the proposed method obtains competitive performance on three benchmarks.

论文关键词:Fine-grained visual categorization,Progressive sampling,Weakly supervised object localization,Attention mechanism

论文评审过程:Received 9 May 2021, Revised 21 October 2021, Accepted 23 October 2021, Available online 28 October 2021, Version of Record 6 November 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107651