File size: 5,161 Bytes
d1843be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# Run DeepLab2 on KITTI-STEP dataset

## KITTI-STEP dataset

KITTI-STEP extends the existing
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php) dataset with
spatially and temporally dense annotations. KITTI-STEP dataset provides a
test-bed for studying long-term pixel-precise segmentation and tracking under
real-world conditions.

### Annotation

KITTI-STEP's annotation is collected in a semi-automatic manner. At the first
stage, the pseudo semantic labels generated by the state-of-the-art
[Panoptic-DeepLab](../projects/panoptic_deeplab.md) are refined by human
annotators with at least one round. Then this new semantic segmentation
annotation is merged with the existing tracking instance ground-truth from the
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php). Please refer
to the following figure as an overview.

<p align="center">
   <img src="../img/step/kitti_step_annotation.png" width=500>
</p>

### Label Map

KITTI-STEP adopts the same 19 classes as defined in
[Cityscapes](https://www.cityscapes-dataset.com/dataset-overview/#class-definitions)
with `pedestrians` and `cars` carefully annotated with track IDs. More
specifically, KITTI-STEP has the following label to index mapping:

Label Name     | Label ID
-------------- | --------
road           | 0
sidewalk       | 1
building       | 2
wall           | 3
fence          | 4
pole           | 5
traffic light  | 6
traffic sign   | 7
vegetation     | 8
terrain        | 9
sky            | 10
person&dagger; | 11
rider          | 12
car&dagger;    | 13
truck          | 14
bus            | 15
train          | 16
motorcycle     | 17
bicycle        | 18
void           | 255

&dagger;: Single instance annotations are available.

### Prepare KITTI-STEP for Training and Evaluation

#### Download data

KITTI-STEP has the same train and test sequences as
[KITTI-MOTS](http://www.cvlibs.net/datasets/kitti/eval_mots.php) (with 21 and 29
sequences for training and testing, respectively). Similarly, the training
sequences are further split into training set (12 sequences) and validation set
(9 sequences).

In the following, we provide a step-by-step walk through to prepare the data.

1.  Create the KITTI-STEP directory: `bash mkdir ${KITTI_STEP_ROOT}/images cd
    ${KITTI_STEP_ROOT}/images`

2.  Download KITTI images from their
    [website](http://www.cvlibs.net/datasets/kitti/index.php) and unzip. `bash
    wget ${KITTI_LINK} unzip ${KITTI_IMAGES}.zip`

3.  To prepare the dataset for our scripts, we need to move and rename some
    directories:

    ```bash
    mv testing/image_02/ test/
    rm -r testing/

    # Move all validation sequences:
    mkdir val
    mv training/image_02/0002 val/
    mv training/image_02/0006 val/
    mv training/image_02/0007 val/
    mv training/image_02/0008 val/
    mv training/image_02/0010 val/
    mv training/image_02/0013 val/
    mv training/image_02/0014 val/
    mv training/image_02/0016 val/
    mv training/image_02/0018 val/

    # Move training sequences
    mv training/image_02/ train/
    rm -r training
    ```

4.  Download groundtruth KITTI-STEP panoptic maps from
    [here](http://storage.googleapis.com/gresearch/tf-deeplab/data/kitti-step.tar.gz).

    ```bash
    # Goto ${KITTI_STEP_ROOT}
    cd ..

    wget http://storage.googleapis.com/gresearch/tf-deeplab/data/kitti-step.tar.gz
    tar -xvf kitti-step.tar.gz
    mv kitti-step/panoptic_maps panoptic_maps
    rm -r kitti-step
    ```

The groundtruth panoptic map is encoded as follows in PNG format:

```
R = semantic_id
G = instance_id // 256
B = instance % 256
```

Following the above guide, your data structure should look like this:

```
.(KITTI_STEP_ROOT)
+-- images
|   |
|   +-- train
|   |   |
|   |   +-- sequence_id (%04d)
|   |       |
|   |       +-- frame_id.png (%06d.png)
|   |   ...
|   +-- val
|   +-- test
|
+-- panoptic_maps
     |
     +-- train
     |   |
     |   +-- sequence_id (%04d)
     |       |
     |       +-- frame_id.png (%06d.png)
     |   ...
     +-- val
```

#### Create tfrecord files

To create dataset for training and evaluation, run the following command:

```bash
python deeplab2/data/build_step_data.py \
  --step_root=${KITTI_STEP_ROOT} \
  --output_dir=${OUTPUT_DIR}
```

This script outputs three sharded tfrecord files:
`{train|val|test}@10.tfrecord`. In the tfrecords, for `train` and `val` set, it
contains the RGB image pixels as well as their panoptic maps. For `test` set, it
contains RGB images only. These files will be used as the input for the model
training and evaluation.

Optionally, you can also specify with `--use_two_frames` to encode two
consecutive frames into the tfrecord files.

## Citing KITTI-STEP

If you find this dataset helpful in your research, please use the following
BibTeX entry.

```
@article{step_2021,
  author={Mark Weber and Jun Xie and Maxwell Collins and Yukun Zhu and Paul Voigtlaender and Hartwig Adam and Bradley Green and Andreas Geiger and Bastian Leibe and Daniel Cremers and Aljosa Osep and Laura Leal-Taixe and Liang-Chieh Chen},
  title={{STEP}: Segmenting and Tracking Every Pixel},
  journal={arXiv:2102.11859},
  year={2021}
}
```