S4Net: Single Stage Salient-Instance Segmentation

Ruochen Fan¹Ming-Ming Cheng² Qibin Hou² Tai-Jiang Mu¹ Jingdong Wang³Shi-Min Hu¹

¹Tsinghua University ²Nankai University ³Microsoft Research Asia

Abstract

We consider an interesting problem—salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320 × 320). We evaluate our approach on a public available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at https://github.com/RuochenFan/S4Net.

Paper

S4Net: Single Stage Salient-Instance Segmentation, Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu, CVPR, 2019. [code] [pdf]

If you find our work is helpful, please cite

@article{fan2017s,
  title={S4Net: Single Stage Salient-Instance Segmentation},
  author={Fan, Ruochen and Cheng, Ming-Ming and Hou, Qibin and Mu, Tai-Jiang and Hu, Shi-Min},
  journal={arXiv preprint arXiv:1711.07618},
  year={2017}
}

Contact

644142239 AT qq DOT com (Ruochen Fan)

RoIMasking

We propose RoIMasking to explicitly incorporate foreground/background separation for improving salient instance segmentation. We explicitly mark the region surrounding the object proposals as the initial background, and explore the foreground/background feature separations for salient instance segmentation in our segmentation branch. More specifically, we flip the signs of the feature values surrounding the proposals.

Network Structure

(a) A brief illustration of our framework. (b) The segmentation branch proposed in Mask R-CNN, which is composed of a stack of consecutive convolutional layers. (c) Our proposed segmentation branch which further enlarges the size of the receptive field.