Spaces:
Runtime error
Runtime error
# Panoptic-DeepLab | |
Panoptic-DeepLab is a state-of-the-art **box-free** system for panoptic | |
segmentation [1], where the goal is to assign a unique value, encoding both | |
semantic label (e.g., person, car) and instance ID (e.g., instance_1, | |
instance_2), to every pixel in an image. | |
Panoptic-DeepLab improves over the DeeperLab [6], which is one of the first | |
box-free systems for panoptic segmentation combining DeepLabv3+ [7] and | |
PersonLab [8], by simplifying the class-agnostic instance detection to only use | |
a center keypoint. As a result, Panoptic-DeepLab predicts three outputs: (1) | |
semantic segmentation, (2) instance center heatmap, and (3) instance center | |
regression. | |
The class-agnostic instance segmentation is first obtained by grouping | |
the predicted foreground pixels (inferred by semantic segmentation) to their | |
closest predicted instance centers [2]. To generate final panoptic segmentation, | |
we then fuse the class-agnostic instance segmentation with semantic segmentation | |
by the efficient majority-vote scheme [6]. | |
<p align="center"> | |
<img src="../img/panoptic_deeplab.png" width=800> | |
</p> | |
## Prerequisite | |
1. Make sure the software is properly [installed](../setup/installation.md). | |
2. Make sure the target dataset is correctly prepared (e.g., | |
[Cityscapes](../setup/cityscapes.md), [COCO](../setup/coco.md)). | |
3. Download the ImageNet pretrained | |
[checkpoints](./imagenet_pretrained_checkpoints.md), and update the | |
`initial_checkpoint` path in the config files. | |
## Model Zoo | |
In the Model Zoo, we explore building Panoptic-DeepLab on top of several | |
backbones (e.g., ResNet model variants [3]). | |
Herein, we highlight some of the employed backbones: | |
1. **ResNet-50-Beta**: We replace the original stem in ResNet-50 [3] with the | |
Inception stem [9], i.e., the first original 7x7 convolution is replaced | |
by three 3x3 convolutions. | |
2. **Wide-ResNet-41**: We modify the Wide-ResNet-38 [5] by (1) removing the | |
last residual block, and (2) repeating the second last residual block two | |
more times. | |
3. **SWideRNet-SAC-(1, 1, x)**, where x = $$\{1, 3, 4.5\}$$, scaling the | |
backbone layers (excluding the stem) of Wide-ResNet-41 by a factor of x. This | |
backbone only employs the Switchable Atrous Convolution (SAC) without the | |
Squeeze-and-Excitation modules [10]. | |
### Cityscapes Panoptic Segmentation | |
We provide checkpoints pretrained on Cityscapes train-fine set below. If you | |
would like to train those models by yourself, please find the corresponding | |
config files under the directory | |
[configs/cityscapes/panoptic_deeplab](../../configs/cityscapes/panoptic_deeplab). | |
All the reported results are obtained by *single-scale* inference and | |
*ImageNet-1K* pretrained checkpoints. | |
Backbone | Output stride | Input resolution | PQ [*] | mIoU [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**] | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :---------------: | :----: | :------: | :-----: | :-------: | :--------------------: | |
MobilenetV3-S ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_small_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_small_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 46.7 | 69.5 | 46.92 | 69.8 | 16.53 | |
MobilenetV3-L ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_large_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 52.7 | 73.8 | 53.07 | 74.15 | 22.58 | |
ResNet-50 ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_os32_merge_with_pure_tf_func.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 59.8 | 76.0 | 60.24 | 76.36 | 30.01 | |
ResNet-50-Beta ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_beta_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 32 | 1025 x 2049 | 60.8 | 77.0 | 61.16 | 77.37 | 31.58 | |
Wide-ResNet-41 ([config](../../configs/cityscapes/panoptic_deeplab/wide_resnet41_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/wide_resnet41_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.4 | 81.5 | 64.83 | 81.92 | 36.07 | |
SWideRNet-SAC-(1, 1, 1) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_1_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_1_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 64.3 | 81.8 | 64.81 | 82.24 | 36.80 | |
SWideRNet-SAC-(1, 1, 3) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_3_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_3_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz))) | 16 | 1025 x 2049 | 66.6 | 82.1 | 67.05 | 82.67 | 38.59 | |
SWideRNet-SAC-(1, 1, 4.5) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_4.5_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) | 16 | 1025 x 2049 | 66.8 | 82.2 | 67.29 | 82.74 | 39.51 | |
[*]: Results evaluated by the official script. Instance segmentation evaluation | |
is not supported yet (need to convert our prediction format). | |
[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md). | |
### COCO Panoptic Segmentation | |
We provide checkpoints pretrained on COCO train set below. If you would like to | |
train those models by yourself, please find the corresponding config files under | |
the directory | |
[configs/coco/panoptic_deeplab](../../configs/coco/panoptic_deeplab). | |
All the reported results are obtained by *single-scale* inference and | |
*ImageNet-1K* pretrained checkpoints. | |
Backbone | Output stride | Input resolution | PQ [*] | PQ [**] | mIoU [**] | AP<sup>Mask</sup> [**] | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------: | :---------------: | :----: | :-----: | :-------: | :--------------------: | |
ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_coco_train_2.tar.gz)) | 32 | 641 x 641 | 34.1 | 34.60 | 54.75 | 18.50 | |
ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os32_panoptic_deeplab_coco_train.tar.gz)) | 32 | 641 x 641 | 34.6 | 35.10 | 54.98 | 19.24 | |
ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.1 | 35.67 | 55.52 | 19.40 | |
ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os16_panoptic_deeplab_coco_train.tar.gz)) | 16 | 641 x 641 | 35.2 | 35.76 | 55.45 | 19.63 | |
\[*]: Results evaluated by the official script. | |
\[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md). | |
## Citing Panoptic-DeepLab | |
If you find this code helpful in your research or wish to refer to the baseline | |
results, please use the following BibTeX entry. | |
* Panoptic-DeepLab: | |
``` | |
@inproceedings{panoptic_deeplab_2020, | |
author={Bowen Cheng and Maxwell D Collins and Yukun Zhu and Ting Liu and Thomas S Huang and Hartwig Adam and Liang-Chieh Chen}, | |
title={{Panoptic-DeepLab}: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation}, | |
booktitle={CVPR}, | |
year={2020} | |
} | |
``` | |
If you use the Wide-ResNet-41 backbone, please consider citing | |
* Naive-Student: | |
``` | |
@inproceedings{naive_student_2020, | |
title={{Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation}}, | |
author={Chen, Liang-Chieh and Lopes, Raphael Gontijo and Cheng, Bowen and Collins, Maxwell D and Cubuk, Ekin D and Zoph, Barret and Adam, Hartwig and Shlens, Jonathon}, | |
booktitle={ECCV}, | |
year={2020} | |
} | |
``` | |
If you use the SWideRNet backbone w/ Switchable Atrous Convolution, | |
please consider citing | |
* SWideRNet: | |
``` | |
@article{swidernet_2020, | |
title={Scaling Wide Residual Networks for Panoptic Segmentation}, | |
author={Chen, Liang-Chieh and Wang, Huiyu and Qiao, Siyuan}, | |
journal={arXiv:2011.11675}, | |
year={2020} | |
} | |
``` | |
* Swichable Atrous Convolution (SAC): | |
``` | |
@inproceedings{detectors_2021, | |
title={{DetectoRS}: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution}, | |
author={Qiao, Siyuan and Chen, Liang-Chieh and Yuille, Alan}, | |
booktitle={CVPR}, | |
year={2021} | |
} | |
``` | |
If you use the MobileNetv3 backbone, please consider citing | |
* MobileNetv3 | |
``` | |
@inproceedings{howard2019searching, | |
title={Searching for {MobileNetV3}}, | |
author={Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others}, | |
booktitle={ICCV}, | |
year={2019} | |
} | |
``` | |
### References | |
1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr | |
Dollar. "Panoptic segmentation." In CVPR, 2019. | |
2. Alex Kendall, Yarin Gal, and Roberto Cipolla. "Multi-task learning using | |
uncertainty to weigh losses for scene geometry and semantics." In CVPR, 2018. | |
3. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual | |
learning for image recognition." In CVPR, 2016. | |
4. Sergey Zagoruyko and Nikos Komodakis. "Wide residual networks." In BMVC, | |
2016. | |
5. Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper: | |
Revisiting the ResNet model for visual recognition." Pattern Recognition, | |
2019. | |
6. Tien-Ju Yang, Maxwell D Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu, | |
Xiao Zhang, Vivienne Sze, George Papandreou, and Liang-Chieh Chen. | |
"DeeperLab: Single-shot image parser." arXiv:1902.05093, 2019. | |
7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and | |
Hartwig Adam. "Encoder-decoder with atrous separable convolution for | |
semantic image segmentation." In ECCV, 2018. | |
8. George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, | |
Jonathan Tompson, and Kevin Murphy. "Personlab: Person pose estimation | |
and instance segmentation with a bottom-up, part-based, geometric embedding | |
model." In ECCV, 2018. | |
9. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and | |
Zbigniew Wojna. "Rethinking the inception architecture for computer | |
vision." In CVPR, 2016. | |
10. Jie Hu, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." | |
In CVPR, 2018. | |