Spaces:

akhaliq
/

deeplab2

Runtime error

App Files Files Community

deeplab2 / g3doc /projects /panoptic_deeplab.md

akhaliq3

spaces demo

506da10 almost 4 years ago

preview code

raw

history blame contribute delete

12.2 kB

	# Panoptic-DeepLab

	Panoptic-DeepLab is a state-of-the-art box-free system for panoptic
	segmentation [1], where the goal is to assign a unique value, encoding both
	semantic label (e.g., person, car) and instance ID (e.g., instance_1,
	instance_2), to every pixel in an image.

	Panoptic-DeepLab improves over the DeeperLab [6], which is one of the first
	box-free systems for panoptic segmentation combining DeepLabv3+ [7] and
	PersonLab [8], by simplifying the class-agnostic instance detection to only use
	a center keypoint. As a result, Panoptic-DeepLab predicts three outputs: (1)
	semantic segmentation, (2) instance center heatmap, and (3) instance center
	regression.

	The class-agnostic instance segmentation is first obtained by grouping
	the predicted foreground pixels (inferred by semantic segmentation) to their
	closest predicted instance centers [2]. To generate final panoptic segmentation,
	we then fuse the class-agnostic instance segmentation with semantic segmentation
	by the efficient majority-vote scheme [6].


	<p align="center">
	<img src="../img/panoptic_deeplab.png" width=800>
	</p>


	## Prerequisite

	1. Make sure the software is properly [installed](../setup/installation.md).

	2. Make sure the target dataset is correctly prepared (e.g.,
	[Cityscapes](../setup/cityscapes.md), [COCO](../setup/coco.md)).

	3. Download the ImageNet pretrained
	[checkpoints](./imagenet_pretrained_checkpoints.md), and update the
	`initial_checkpoint` path in the config files.

	## Model Zoo

	In the Model Zoo, we explore building Panoptic-DeepLab on top of several
	backbones (e.g., ResNet model variants [3]).

	Herein, we highlight some of the employed backbones:

	1. ResNet-50-Beta: We replace the original stem in ResNet-50 [3] with the
	Inception stem [9], i.e., the first original 7x7 convolution is replaced
	by three 3x3 convolutions.

	2. Wide-ResNet-41: We modify the Wide-ResNet-38 [5] by (1) removing the
	last residual block, and (2) repeating the second last residual block two
	more times.

	3. SWideRNet-SAC-(1, 1, x), where x = $$\{1, 3, 4.5\}$$, scaling the
	backbone layers (excluding the stem) of Wide-ResNet-41 by a factor of x. This
	backbone only employs the Switchable Atrous Convolution (SAC) without the
	Squeeze-and-Excitation modules [10].

	### Cityscapes Panoptic Segmentation

	We provide checkpoints pretrained on Cityscapes train-fine set below. If you
	would like to train those models by yourself, please find the corresponding
	config files under the directory
	[configs/cityscapes/panoptic_deeplab](../../configs/cityscapes/panoptic_deeplab).

	All the reported results are obtained by single-scale inference and
	ImageNet-1K pretrained checkpoints.

	Backbone \| Output stride \| Input resolution \| PQ [] \| mIoU [] \| PQ [] \| mIoU [] \| AP<sup>Mask</sup> [**]
	-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \| :-----------: \| :---------------: \| :----: \| :------: \| :-----: \| :-------: \| :--------------------:
	MobilenetV3-S ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_small_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_small_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 32 \| 1025 x 2049 \| 46.7 \| 69.5 \| 46.92 \| 69.8 \| 16.53
	MobilenetV3-L ([config](../../configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/mobilenet_v3_large_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 32 \| 1025 x 2049 \| 52.7 \| 73.8 \| 53.07 \| 74.15 \| 22.58
	ResNet-50 ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_os32_merge_with_pure_tf_func.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 32 \| 1025 x 2049 \| 59.8 \| 76.0 \| 60.24 \| 76.36 \| 30.01
	ResNet-50-Beta ([config](../../configs/cityscapes/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_beta_os32_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 32 \| 1025 x 2049 \| 60.8 \| 77.0 \| 61.16 \| 77.37 \| 31.58
	Wide-ResNet-41 ([config](../../configs/cityscapes/panoptic_deeplab/wide_resnet41_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/wide_resnet41_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 16 \| 1025 x 2049 \| 64.4 \| 81.5 \| 64.83 \| 81.92 \| 36.07
	SWideRNet-SAC-(1, 1, 1) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_1_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_1_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 16 \| 1025 x 2049 \| 64.3 \| 81.8 \| 64.81 \| 82.24 \| 36.80
	SWideRNet-SAC-(1, 1, 3) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_3_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_3_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz))) \| 16 \| 1025 x 2049 \| 66.6 \| 82.1 \| 67.05 \| 82.67 \| 38.59
	SWideRNet-SAC-(1, 1, 4.5) ([config](../../configs/cityscapes/panoptic_deeplab/swidernet_sac_1_1_4.5_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz)) \| 16 \| 1025 x 2049 \| 66.8 \| 82.2 \| 67.29 \| 82.74 \| 39.51

	[*]: Results evaluated by the official script. Instance segmentation evaluation
	is not supported yet (need to convert our prediction format).

	[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md).

	### COCO Panoptic Segmentation

	We provide checkpoints pretrained on COCO train set below. If you would like to
	train those models by yourself, please find the corresponding config files under
	the directory
	[configs/coco/panoptic_deeplab](../../configs/coco/panoptic_deeplab).

	All the reported results are obtained by single-scale inference and
	ImageNet-1K pretrained checkpoints.

	Backbone \| Output stride \| Input resolution \| PQ [] \| PQ [] \| mIoU [] \| AP<sup>Mask</sup> [*]
	------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ \| :-----------: \| :---------------: \| :----: \| :-----: \| :-------: \| :--------------------:
	ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os32_panoptic_deeplab_coco_train_2.tar.gz)) \| 32 \| 641 x 641 \| 34.1 \| 34.60 \| 54.75 \| 18.50
	ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os32.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os32_panoptic_deeplab_coco_train.tar.gz)) \| 32 \| 641 x 641 \| 34.6 \| 35.10 \| 54.98 \| 19.24
	ResNet-50 ([config](../../configs/coco/panoptic_deeplab/resnet50_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50_os16_panoptic_deeplab_coco_train.tar.gz)) \| 16 \| 641 x 641 \| 35.1 \| 35.67 \| 55.52 \| 19.40
	ResNet-50-Beta ([config](../../configs/coco/panoptic_deeplab/resnet50_beta_os16.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/resnet50beta_os16_panoptic_deeplab_coco_train.tar.gz)) \| 16 \| 641 x 641 \| 35.2 \| 35.76 \| 55.45 \| 19.63

	\[*]: Results evaluated by the official script.

	\[**]: Results evaluated by our pipeline. See Q4 in [FAQ](../faq.md).

	## Citing Panoptic-DeepLab

	If you find this code helpful in your research or wish to refer to the baseline
	results, please use the following BibTeX entry.

	* Panoptic-DeepLab:

	```
	@inproceedings{panoptic_deeplab_2020,
	author={Bowen Cheng and Maxwell D Collins and Yukun Zhu and Ting Liu and Thomas S Huang and Hartwig Adam and Liang-Chieh Chen},
	title={{Panoptic-DeepLab}: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation},
	booktitle={CVPR},
	year={2020}
	}

	```

	If you use the Wide-ResNet-41 backbone, please consider citing

	* Naive-Student:

	```
	@inproceedings{naive_student_2020,
	title={{Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation}},
	author={Chen, Liang-Chieh and Lopes, Raphael Gontijo and Cheng, Bowen and Collins, Maxwell D and Cubuk, Ekin D and Zoph, Barret and Adam, Hartwig and Shlens, Jonathon},
	booktitle={ECCV},
	year={2020}
	}
	```

	If you use the SWideRNet backbone w/ Switchable Atrous Convolution,
	please consider citing

	* SWideRNet:

	```
	@article{swidernet_2020,
	title={Scaling Wide Residual Networks for Panoptic Segmentation},
	author={Chen, Liang-Chieh and Wang, Huiyu and Qiao, Siyuan},
	journal={arXiv:2011.11675},
	year={2020}
	}

	```

	* Swichable Atrous Convolution (SAC):

	```
	@inproceedings{detectors_2021,
	title={{DetectoRS}: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution},
	author={Qiao, Siyuan and Chen, Liang-Chieh and Yuille, Alan},
	booktitle={CVPR},
	year={2021}
	}

	```

	If you use the MobileNetv3 backbone, please consider citing

	* MobileNetv3

	```
	@inproceedings{howard2019searching,
	title={Searching for {MobileNetV3}},
	author={Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others},
	booktitle={ICCV},
	year={2019}
	}
	```

	### References

	1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr
	Dollar. "Panoptic segmentation." In CVPR, 2019.

	2. Alex Kendall, Yarin Gal, and Roberto Cipolla. "Multi-task learning using
	uncertainty to weigh losses for scene geometry and semantics." In CVPR, 2018.

	3. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual
	learning for image recognition." In CVPR, 2016.

	4. Sergey Zagoruyko and Nikos Komodakis. "Wide residual networks." In BMVC,
	2016.

	5. Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper:
	Revisiting the ResNet model for visual recognition." Pattern Recognition,
	2019.

	6. Tien-Ju Yang, Maxwell D Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,
	Xiao Zhang, Vivienne Sze, George Papandreou, and Liang-Chieh Chen.
	"DeeperLab: Single-shot image parser." arXiv:1902.05093, 2019.

	7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
	Hartwig Adam. "Encoder-decoder with atrous separable convolution for
	semantic image segmentation." In ECCV, 2018.

	8. George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris,
	Jonathan Tompson, and Kevin Murphy. "Personlab: Person pose estimation
	and instance segmentation with a bottom-up, part-based, geometric embedding
	model." In ECCV, 2018.

	9. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and
	Zbigniew Wojna. "Rethinking the inception architecture for computer
	vision." In CVPR, 2016.

	10. Jie Hu, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."
	In CVPR, 2018.