Mask encoding: A general instance mask representation for object segmentation
作者:
Highlights:
• We propose to encode a two-dimensional binary instance mask into a compact representation vector. The compressed vector, takes advantages of the redundancy in the original mask and proves to be effective and efficient for reconstruction.
• Encoding can be done with a few dictionary learning methods, including principal component analysis (PCA), sparse coding, and auto-encoders. We integrate this mask representation into Mask R-CNN framework with slight modifications to the model architecture. Our method consistently improves mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset.
• With this mask representation, a new framework is proposed for single shot instance segmentation, by extending FCOS with a mask branch for mask coefficient regression. Our mask encoding is completely independent of the mechanism of detectors, and it could be easily incorporated into other object detectors. Our method holds a significant lead in accuracy compared with other explicit contour-based one-stage frameworks.
• The proposed method is seamlessly extended for video instance segmentation across video frames by adding a vanilla track branch, achieving favourable performance on YouTube-VIS dataset.
摘要
•We propose to encode a two-dimensional binary instance mask into a compact representation vector. The compressed vector, takes advantages of the redundancy in the original mask and proves to be effective and efficient for reconstruction.•Encoding can be done with a few dictionary learning methods, including principal component analysis (PCA), sparse coding, and auto-encoders. We integrate this mask representation into Mask R-CNN framework with slight modifications to the model architecture. Our method consistently improves mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset.•With this mask representation, a new framework is proposed for single shot instance segmentation, by extending FCOS with a mask branch for mask coefficient regression. Our mask encoding is completely independent of the mechanism of detectors, and it could be easily incorporated into other object detectors. Our method holds a significant lead in accuracy compared with other explicit contour-based one-stage frameworks.•The proposed method is seamlessly extended for video instance segmentation across video frames by adding a vanilla track branch, achieving favourable performance on YouTube-VIS dataset.
论文关键词:Mask encoding,Instance segmentation,Video instance segmentation
论文评审过程:Received 30 July 2021, Revised 25 October 2021, Accepted 20 December 2021, Available online 28 December 2021, Version of Record 3 January 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2021.108505