S4Net: Single Stage Salient-Instance Segmentation

Ruochen FanMing-Ming Cheng2 Qibin Hou2 Tai-Jiang Mu1 Jingdong Wang3 Shi-Min Hu1

1Tsinghua University    2Nankai University    3Microsoft Research Asia


We consider an interesting problem—salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320 × 320). We evaluate our approach on a public available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at https://github.com/RuochenFan/S4Net.


  • S4Net: Single Stage Salient-Instance Segmentation, Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu, CVPR, 2019. [code] [pdf]

If you find our work is helpful, please cite

  title={S4Net: Single Stage Salient-Instance Segmentation},
  author={Fan, Ruochen and Cheng, Ming-Ming and Hou, Qibin and Mu, Tai-Jiang and Hu, Shi-Min},
  journal={arXiv preprint arXiv:1711.07618},


644142239 AT qq DOT com  (Ruochen Fan)


We propose RoIMasking to explicitly incorporate foreground/background separation for improving salient instance segmentation. We explicitly mark the region surrounding the object proposals as the initial background, and explore the foreground/background feature separations for salient instance segmentation in our segmentation branch. More specifically, we flip the signs of the feature values surrounding the proposals.

Network Structure


(a) A brief illustration of our framework. (b) The segmentation branch proposed in Mask R-CNN, which is composed of a stack of consecutive convolutional layers. (c) Our proposed segmentation branch which further enlarges the size of the receptive field.

Visualization Results