Multi-Scale MLP-Mixer for image classification

作者:

Highlights:

摘要

MLP-Mixer is a vision architecture that solely relies on multilayer perceptrons (MLPs), which despite their simple architecture, they achieve a slightly inferior accuracy to the state-of-the-art models on ImageNet. Given that the MLP-Mixer segments each input image into a fixed number of patches, small-scale MLP-Mixers are preferred due to attaining better accuracy because the image is segmented into more patches. However, this strategy significantly increases the computational burden. Nevertheless, this paper argues that even in the same dataset, each image has a different recognition difficulty due to its characteristics. Therefore, in the ideal case, choosing an independently scaled MLP-Mixer for each image is the most economical computational approach. Hence, this paper experimentally verifies the objective existence of this phenomenon, which inspires us to propose the Multi-Scale MLP-Mixer (MSMLP) that utilizes a suitably scaled MLP-Mixer for each input image. MSMLP comprises several MLP-Mixers of different scales. During testing, these MLP-Mixers are activated in order of scale from large to small (increasing number of patches and decreasing patch size). In addition, to reduce redundant computations, a feature reuse mechanism is designed between neighboring MLP-Mixers so that the small-scale MLP-Mixer downstream can reuse the features learned by the larger-scale MLP-Mixer upstream. Finally, extensive experiments on the public dataset CIFAR10/100 reveal that our method’s theoretically estimated computational cost and actual inference speed are significantly higher than those of MLP-Mixer.

论文关键词:MLP-Mixer,Theoretical calculation cost,Multi-scale,Actual reasoning speed

论文评审过程:Received 25 May 2022, Revised 23 August 2022, Accepted 24 August 2022, Available online 5 September 2022, Version of Record 19 October 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109792