Spaces:
Runtime error
Runtime error
# MaX-DeepLab | |
MaX-DeepLab is the first fully **end-to-end** method for panoptic segmentation | |
[1], removing the needs for previously hand-designed priors such as object | |
bounding boxes (used in DETR [2]), instance centers (used in Panoptic-DeepLab | |
[3]), non-maximum suppression, thing-stuff merging, *etc*. | |
The goal of panoptic segmentation is to predict a set of non-overlapping masks | |
along with their corresponding class labels (e.g., person, car, road, sky). | |
MaX-DeepLab achieves this goal directly by predicting a set of class-labeled | |
masks with a mask transformer. | |
<p align="center"> | |
<img src="../img/max_deeplab/overview_simple.png" width=450> | |
</p> | |
The mask transformer is trained end-to-end with a panoptic quality (PQ) inspired | |
loss function, which matches and optimizes the predicted masks to the ground | |
truth masks with a PQ-style similarity metric. In addition, our proposed mask | |
transformer introduces a global memory path beside the pixel path CNN and | |
employs all 4 types of attention between the two paths, allowing the CNN to read | |
and write the global memory in any layer. | |
<p align="center"> | |
<img src="../img/max_deeplab/overview.png" width=500> | |
</p> | |
## Prerequisite | |
1. Make sure the software is properly [installed](../setup/installation.md). | |
2. Make sure the target dataset is correctly prepared (e.g., | |
[COCO](../setup/coco.md)). | |
3. Download the ImageNet pretrained | |
[checkpoints](./imagenet_pretrained_checkpoints.md), and update the | |
`initial_checkpoint` path in the config files. | |
## Model Zoo | |
We explore MaX-DeepLab model variants that are built on top of several backbones | |
(e.g., ResNet model variants [4]). | |
1. **MaX-DeepLab-S** replaces the last two stages of ResNet-50-beta with | |
axial-attention blocks and applies a small dual-path transformer. | |
(ResNet-50-beta replaces the ResNet-50 stem with the Inception stem [5].) | |
### COCO Panoptic Segmentation | |
We provide checkpoints pretrained on COCO 2017 panoptic train set and evaluated | |
on the val set. If you would like to train those models by yourself, please find | |
the corresponding config files under the directory | |
[configs/coco/max_deeplab](../../configs/coco/max_deeplab). | |
All the reported results are obtained by *single-scale* inference and | |
*ImageNet-1K* pretrained checkpoints. | |
Model | Input Resolution | Training Steps | PQ \[\*\] | PQ<sup>thing</sup> \[\*\] | PQ<sup>stuff</sup> \[\*\] | PQ \[\*\*\] | |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------: | :------------: | :-------: | :-----------------------: | :-----------------------: | :---------: | |
MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_100k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_100k_coco_train.tar.gz)) | 641 x 641 | 100k | 45.9 | 49.2 | 40.9 | 46.36 | |
MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_200k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_200k_coco_train.tar.gz)) | 641 x 641 | 200k | 46.5 | 50.6 | 40.4 | 47.04 | |
MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_400k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_400k_coco_train.tar.gz)) | 641 x 641 | 400k | 47.0 | 51.3 | 40.5 | 47.56 | |
MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res1025_100k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res1025_100k_coco_train.tar.gz)) | 1025 x 1025 | 100k | 47.9 | 52.1 | 41.5 | 48.41 | |
MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res1025_200k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res1025_200k_coco_train.tar.gz)) | 1025 x 1025 | 200k | 48.7 | 53.6 | 41.3 | 49.23 | |
\[\*\]: Results evaluated by the official script. \[\*\*\]: Results evaluated by | |
our pipeline. See Q4 in [FAQ](../faq.md). | |
Note that the results are slightly different from the paper, because of the | |
implementation differences: | |
1. Stronger pretrained checkpoints are used in this repo. | |
2. A `linear` drop path schedule is used, rather than a `constant` schedule. | |
3. For simplicity, Adam [6] is used without weight decay, rather than Radam [7] | |
LookAhead [8] with weight decay. | |
## Citing MaX-DeepLab | |
If you find this code helpful in your research or wish to refer to the baseline | |
results, please use the following BibTeX entry. | |
* MaX-DeepLab: | |
``` | |
@inproceedings{max_deeplab_2021, | |
author={Huiyu Wang and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, | |
title={{MaX-DeepLab}: End-to-End Panoptic Segmentation with Mask Transformers}, | |
booktitle={CVPR}, | |
year={2021} | |
} | |
``` | |
* Axial-DeepLab: | |
``` | |
@inproceedings{axial_deeplab_2020, | |
author={Huiyu Wang and Yukun Zhu and Bradley Green and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, | |
title={{Axial-DeepLab}: Stand-Alone Axial-Attention for Panoptic Segmentation}, | |
booktitle={ECCV}, | |
year={2020} | |
} | |
``` | |
### References | |
1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr | |
Dollar. "Panoptic segmentation." In CVPR, 2019. | |
2. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, | |
Alexander Kirillov, and Sergey Zagoruyko. "End-to-End Object Detection with | |
Transformers." In ECCV, 2020. | |
3. Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, | |
Hartwig Adam, and Liang-Chieh Chen. "Panoptic-DeepLab: A Simple, Strong, and | |
Fast Baseline for Bottom-Up Panoptic Segmentation." In CVPR 2020. | |
4. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual | |
learning for image recognition." In CVPR, 2016. | |
5. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew | |
Wojna. "Rethinking the inception architecture for computer vision." In | |
CVPR, 2016. | |
6. Diederik P. Kingma, and Jimmy Ba. "Adam: A Method for Stochastic | |
Optimization" In ICLR, 2015. | |
7. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng | |
Gao, and Jiawei Han. "On the Variance of the Adaptive Learning Rate and | |
Beyond" In ICLR, 2020. | |
8. Michael R. Zhang, James Lucas, Geoffrey Hinton, and Jimmy Ba. "Lookahead | |
Optimizer: k steps forward, 1 step back" In NeurIPS, 2019. | |