Spaces:
Runtime error
Runtime error
File size: 5,161 Bytes
d1843be |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# Run DeepLab2 on KITTI-STEP dataset
## KITTI-STEP dataset
KITTI-STEP extends the existing
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php) dataset with
spatially and temporally dense annotations. KITTI-STEP dataset provides a
test-bed for studying long-term pixel-precise segmentation and tracking under
real-world conditions.
### Annotation
KITTI-STEP's annotation is collected in a semi-automatic manner. At the first
stage, the pseudo semantic labels generated by the state-of-the-art
[Panoptic-DeepLab](../projects/panoptic_deeplab.md) are refined by human
annotators with at least one round. Then this new semantic segmentation
annotation is merged with the existing tracking instance ground-truth from the
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php). Please refer
to the following figure as an overview.
<p align="center">
<img src="../img/step/kitti_step_annotation.png" width=500>
</p>
### Label Map
KITTI-STEP adopts the same 19 classes as defined in
[Cityscapes](https://www.cityscapes-dataset.com/dataset-overview/#class-definitions)
with `pedestrians` and `cars` carefully annotated with track IDs. More
specifically, KITTI-STEP has the following label to index mapping:
Label Name | Label ID
-------------- | --------
road | 0
sidewalk | 1
building | 2
wall | 3
fence | 4
pole | 5
traffic light | 6
traffic sign | 7
vegetation | 8
terrain | 9
sky | 10
person† | 11
rider | 12
car† | 13
truck | 14
bus | 15
train | 16
motorcycle | 17
bicycle | 18
void | 255
†: Single instance annotations are available.
### Prepare KITTI-STEP for Training and Evaluation
#### Download data
KITTI-STEP has the same train and test sequences as
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php) (with 21 and 29
sequences for training and testing, respectively). Similarly, the training
sequences are further split into training set (12 sequences) and validation set
(9 sequences).
In the following, we provide a step-by-step walk through to prepare the data.
1. Create the KITTI-STEP directory: `bash mkdir ${KITTI_STEP_ROOT}/images cd
${KITTI_STEP_ROOT}/images`
2. Download KITTI images from their
[website](http://www.cvlibs.net/datasets/kitti/index.php) and unzip. `bash
wget ${KITTI_LINK} unzip ${KITTI_IMAGES}.zip`
3. To prepare the dataset for our scripts, we need to move and rename some
directories:
```bash
mv testing/image_02/ test/
rm -r testing/
# Move all validation sequences:
mkdir val
mv training/image_02/0002 val/
mv training/image_02/0006 val/
mv training/image_02/0007 val/
mv training/image_02/0008 val/
mv training/image_02/0010 val/
mv training/image_02/0013 val/
mv training/image_02/0014 val/
mv training/image_02/0016 val/
mv training/image_02/0018 val/
# Move training sequences
mv training/image_02/ train/
rm -r training
```
4. Download groundtruth KITTI-STEP panoptic maps from
[here](http://storage.googleapis.com/gresearch/tf-deeplab/data/kitti-step.tar.gz).
```bash
# Goto ${KITTI_STEP_ROOT}
cd ..
wget http://storage.googleapis.com/gresearch/tf-deeplab/data/kitti-step.tar.gz
tar -xvf kitti-step.tar.gz
mv kitti-step/panoptic_maps panoptic_maps
rm -r kitti-step
```
The groundtruth panoptic map is encoded as follows in PNG format:
```
R = semantic_id
G = instance_id // 256
B = instance % 256
```
Following the above guide, your data structure should look like this:
```
.(KITTI_STEP_ROOT)
+-- images
| |
| +-- train
| | |
| | +-- sequence_id (%04d)
| | |
| | +-- frame_id.png (%06d.png)
| | ...
| +-- val
| +-- test
|
+-- panoptic_maps
|
+-- train
| |
| +-- sequence_id (%04d)
| |
| +-- frame_id.png (%06d.png)
| ...
+-- val
```
#### Create tfrecord files
To create dataset for training and evaluation, run the following command:
```bash
python deeplab2/data/build_step_data.py \
--step_root=${KITTI_STEP_ROOT} \
--output_dir=${OUTPUT_DIR}
```
This script outputs three sharded tfrecord files:
`{train|val|test}@10.tfrecord`. In the tfrecords, for `train` and `val` set, it
contains the RGB image pixels as well as their panoptic maps. For `test` set, it
contains RGB images only. These files will be used as the input for the model
training and evaluation.
Optionally, you can also specify with `--use_two_frames` to encode two
consecutive frames into the tfrecord files.
## Citing KITTI-STEP
If you find this dataset helpful in your research, please use the following
BibTeX entry.
```
@article{step_2021,
author={Mark Weber and Jun Xie and Maxwell Collins and Yukun Zhu and Paul Voigtlaender and Hartwig Adam and Bradley Green and Andreas Geiger and Bastian Leibe and Daniel Cremers and Aljosa Osep and Laura Leal-Taixe and Liang-Chieh Chen},
title={{STEP}: Segmenting and Tracking Every Pixel},
journal={arXiv:2102.11859},
year={2021}
}
```
|