Spaces:
Runtime error
Runtime error
File size: 12,217 Bytes
0924f30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# Panoptic-DeepLab
Panoptic-DeepLab is a state-of-the-art **box-free** system for panoptic
segmentation [1], where the goal is to assign a unique value, encoding both
semantic label (e.g., person, car) and instance ID (e.g., instance_1,
instance_2), to every pixel in an image.
Panoptic-DeepLab improves over the DeeperLab [6], which is one of the first
box-free systems for panoptic segmentation combining DeepLabv3+ [7] and
PersonLab [8], by simplifying the class-agnostic instance detection to only use
a center keypoint. As a result, Panoptic-DeepLab predicts three outputs: (1)
semantic segmentation, (2) instance center heatmap, and (3) instance center
regression.
The class-agnostic instance segmentation is first obtained by grouping
the predicted foreground pixels (inferred by semantic segmentation) to their
closest predicted instance centers [2]. To generate final panoptic segmentation,
we then fuse the class-agnostic instance segmentation with semantic segmentation
by the efficient majority-vote scheme [6].
<p align="center">
<img src="../img/panoptic_deeplab.png" width=800>
</p>
## Prerequisite
1. Make sure the software is properly [installed](../setup/installation.md).
2. Make sure the target dataset is correctly prepared (e.g.,
[Cityscapes](../setup/cityscapes.md), [COCO](../setup/coco.md)).
3. Download the ImageNet pretrained
[checkpoints](./imagenet_pretrained_checkpoints.md), and update the
`initial_checkpoint` path in the config files.
## Model Zoo
In the Model Zoo, we explore building Panoptic-DeepLab on top of several
backbones (e.g., ResNet model variants [3]).
Herein, we highlight some of the employed backbones:
1. **ResNet-50-Beta**: We replace the original stem in ResNet-50 [3] with the
Inception stem [9], i.e., the first original 7x7 convolution is replaced
by three 3x3 convolutions.
2. **Wide-ResNet-41**: We modify the Wide-ResNet-38 [5] by (1) removing the
last residual block, and (2) repeating the second last residual block two
more times.
3. **SWideRNet-SAC-(1, 1, x)**, where x = $$\{1, 3, 4.5\}$$, scaling the
backbone layers (excluding the stem) of Wide-ResNet-41 by a factor of x. This
backbone only employs the Switchable Atrous Convolution (SAC) without the
Squeeze-and-Excitation modules [10].
### Cityscapes Panoptic Segmentation
We provide checkpoints pretrained on Cityscapes train-fine set below. If you
would like to train those models by yourself, please find the corresponding
config files under the directory
[configs/cityscapes/panoptic_deeplab](../../configs/cityscapes/panoptic_deeplab).
All the reported results are obtained by *single-scale* inference and
*ImageNet-1K* pretrained checkpoints.
Backbone | Output stride | Input resolution | PQ [*] | mIoU [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**]
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :---------------: | :----: | :------: | :-----: | :-------: | :--------------------:
MobilenetV3-S ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_small_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_small_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 46.7 | 69.5 | 46.92 | 69.8 | 16.53
MobilenetV3-L ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_large_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 52.7 | 73.8 | 53.07 | 74.15 | 22.58
ResNet-50 ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_os32_merge_with_pure_tf_func.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 59.8 | 76.0 | 60.24 | 76.36 | 30.01
ResNet-50-Beta ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_beta_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 60.8 | 77.0 | 61.16 | 77.37 | 31.58
Wide-ResNet-41 ([config](../../configs/cityscapes/panoptic_deeplab/wide_resnet41_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/wide_resnet41_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.4 | 81.5 | 64.83 | 81.92 | 36.07
SWideRNet-SAC-(1, 1, 1) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_1_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_1_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.3 | 81.8 | 64.81 | 82.24 | 36.80
SWideRNet-SAC-(1, 1, 3) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_3_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_3_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz))) | 16 | 1025 x 2049 | 66.6 | 82.1 | 67.05 | 82.67 | 38.59
SWideRNet-SAC-(1, 1, 4.5) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_4.5_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 66.8 | 82.2 | 67.29 | 82.74 | 39.51
[*]: Results evaluated by the official script. Instance segmentation evaluation
is not supported yet (need to convert our prediction format).
[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md).
### COCO Panoptic Segmentation
We provide checkpoints pretrained on COCO train set below. If you would like to
train those models by yourself, please find the corresponding config files under
the directory
[configs/coco/panoptic_deeplab](../../configs/coco/panoptic_deeplab).
All the reported results are obtained by *single-scale* inference and
*ImageNet-1K* pretrained checkpoints.
Backbone | Output stride | Input resolution | PQ [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------: | :---------------: | :----: | :-----: | :-------: | :--------------------:
ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_coco_train_2.tar.gz)) | 32 | 641 x 641 | 34.1 | 34.60 | 54.75 | 18.50
ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os32_panoptic_deeplab_coco_train.tar.gz)) | 32 | 641 x 641 | 34.6 | 35.10 | 54.98 | 19.24
ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.1 | 35.67 | 55.52 | 19.40
ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.2 | 35.76 | 55.45 | 19.63
\[*]: Results evaluated by the official script.
\[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md).
## Citing Panoptic-DeepLab
If you find this code helpful in your research or wish to refer to the baseline
results, please use the following BibTeX entry.
* Panoptic-DeepLab:
```
@inproceedings{panoptic_deeplab_2020,
author={Bowen Cheng and Maxwell D Collins and Yukun Zhu and Ting Liu and Thomas S Huang and Hartwig Adam and Liang-Chieh Chen},
title={{Panoptic-DeepLab}: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation},
booktitle={CVPR},
year={2020}
}
```
If you use the Wide-ResNet-41 backbone, please consider citing
* Naive-Student:
```
@inproceedings{naive_student_2020,
title={{Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation}},
author={Chen, Liang-Chieh and Lopes, Raphael Gontijo and Cheng, Bowen and Collins, Maxwell D and Cubuk, Ekin D and Zoph, Barret and Adam, Hartwig and Shlens, Jonathon},
booktitle={ECCV},
year={2020}
}
```
If you use the SWideRNet backbone w/ Switchable Atrous Convolution,
please consider citing
* SWideRNet:
```
@article{swidernet_2020,
title={Scaling Wide Residual Networks for Panoptic Segmentation},
author={Chen, Liang-Chieh and Wang, Huiyu and Qiao, Siyuan},
journal={arXiv:2011.11675},
year={2020}
}
```
* Swichable Atrous Convolution (SAC):
```
@inproceedings{detectors_2021,
title={{DetectoRS}: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution},
author={Qiao, Siyuan and Chen, Liang-Chieh and Yuille, Alan},
booktitle={CVPR},
year={2021}
}
```
If you use the MobileNetv3 backbone, please consider citing
* MobileNetv3
```
@inproceedings{howard2019searching,
title={Searching for {MobileNetV3}},
author={Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others},
booktitle={ICCV},
year={2019}
}
```
### References
1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr
Dollar. "Panoptic segmentation." In CVPR, 2019.
2. Alex Kendall, Yarin Gal, and Roberto Cipolla. "Multi-task learning using
uncertainty to weigh losses for scene geometry and semantics." In CVPR, 2018.
3. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual
learning for image recognition." In CVPR, 2016.
4. Sergey Zagoruyko and Nikos Komodakis. "Wide residual networks." In BMVC,
2016.
5. Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper:
Revisiting the ResNet model for visual recognition." Pattern Recognition,
2019.
6. Tien-Ju Yang, Maxwell D Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,
Xiao Zhang, Vivienne Sze, George Papandreou, and Liang-Chieh Chen.
"DeeperLab: Single-shot image parser." arXiv:1902.05093, 2019.
7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
Hartwig Adam. "Encoder-decoder with atrous separable convolution for
semantic image segmentation." In ECCV, 2018.
8. George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris,
Jonathan Tompson, and Kevin Murphy. "Personlab: Person pose estimation
and instance segmentation with a bottom-up, part-based, geometric embedding
model." In ECCV, 2018.
9. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and
Zbigniew Wojna. "Rethinking the inception architecture for computer
vision." In CVPR, 2016.
10. Jie Hu, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."
In CVPR, 2018.
|