Swin-MFINet: Swin transformer based multi-feature integration network for detection of pixel-level surface defects

作者：

Highlights：

•

摘要

Automatic surface defect detection is critical for manufacturing industries, such as steel, fabric, and marble industries. This study proposes a Swin transformer-based model called Multi-Feature Integration Network (Swin-MFINet) for pixel-level surface defect detection. The proposed model consists of an encoder, a Swin transformer-based decoder, and Multi-Feature Integration (MFI) modules. In the encoder module of the proposed model, a pre-trained Inception network is used to extract key features from small-size datasets. In the decoder section, global semantic features are obtained from the initial features by using the Swin-transformer block, which is the newest transformer technology of today. In addition, the convolution layer is used in the last step of the decoder, since transformers are limited in acquiring small spatial details such as edges, colors, and textures, which are important in detecting some small defects. In the last module called MFI, feature maps from different decoder stages are combined, and the channel squeeze-spatial excitation block is applied to reveal important features. Finally, a prediction map is obtained by applying a convolution layer and sigmoid activation function to the MFI module output, respectively. The performance of proposed model is analyzed over MT and MVTec datasets containing surface defect images. The proposed model obtained mIoU scores of 81.37%, and 77.07% respectively, for these two datasets These results outperform the state-of-the-art for the surface defect detection problem.

论文关键词：Pixel-Level Surface Defects Detection,Swin Transformers,Encoder-Decoder Network,Convolutional Neural Network

论文评审过程：Received 21 February 2022, Revised 24 May 2022, Accepted 22 July 2022, Available online 27 July 2022, Version of Record 31 July 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.118269