Spaces:

karolmajek
/

Axial-DeepLab-SWideRNet

Runtime error

App Files Files Community

Axial-DeepLab-SWideRNet / g3doc /faq.md

karolmajek

from https://huggingface.co/spaces/akhaliq/deeplab2

d1843be over 3 years ago

preview code

raw

history blame contribute delete

4.75 kB

	# FAQ

	________________________________________________________________________________

	**Q1: What should I do if I encounter OOM (out-of-memory) while training the
	models?**

	A1: To avoid OOM, you could try:

	1. reducing the training crop size (i.e., the flag `crop_size` in
	`train_dataset_options`, and see Q2 for more details), which reduces the
	input size during training,

	2. using a larger output stride (e.g., 32) in the backbone (i.e., the flag
	`output_stride` in `model_options`, and see Q3 for more details), which
	reduces the usage of atrous convolution,

	3. using a smaller backbone, such as ResNet-50.

	________________________________________________________________________________

	Q2: What is the `crop_size` I need to set?

	A2: DeepLab framework always uses `crop_size` equal to `output_stride` * k +
	1, where k is an integer.

	* During inference/evaluation, since DeepLab framework uses whole-image
	inference, we need to set k so that the resulting `crop_size` (in
	`eval_dataset_options`) is slightly larger the largest image dimension in
	the dataset. For example, we set eval_crop_size = 1025x2049 for Cityscapes
	images whose image dimension is all equal to 1024x2048.

	* During training, we could set k to be any integer as long as it fits to your
	device memory. However, we notice a better performance when we have the same
	`crop_size` during training and evaluation (i.e., also use whole-image crop
	size during training).

	________________________________________________________________________________

	Q3: What output stride should I use in the backbone?

	A3: Using a different output stride leads to a different accuracy-and-memory
	trade-off. For example, DeepLabv1 uses output stride = 8, but it requires a lot
	of device memory. In DeepLabv3+ paper, we found that using output stride = 16
	strikes the best accuracy-and-memory trade-off, which is then our default
	setting. If you wish to further reduce the memory usage, you could set output
	stride to 32. Additionally, we suggest adjusting the `atrous_rates` in the ASPP
	module as follows.

	* If `backbone.output_stride` = 32, use `atrous_rates` = [3, 6, 9].

	* If `backbone.output_stride` = 16, use `atrous_rates` = [6, 12, 18].

	* If `backbone.output_stride` = 8, use `atrous_rates` = [12, 24, 36].

	Note that these settings may not be optimal. You may need to adjust them to
	better fit your dataset.

	________________________________________________________________________________

	**Q4: Why are the results reported by the provided evaluation code slightly
	different from the official evaluation code (e.g.,
	[Cityscapes](https://github.com/mcordts/cityscapesScripts))?**

	A4: In order to run everything end-to-end in the TensorFlow system (e.g.,
	the on-line evaluation during training), we re-implemented the evaluation codes
	in TensorFlow. Additionally, our whole system, including the training and
	evaluation pipelines, uses the panoptic label format (i.e., `panoptic_label =
	semantic_label * label_divisor + instance_id`, where the `label_divisor` should
	be larger than the maximum number of instances per image), instead of the JSON
	[COCO formats](https://cocodataset.org/#format-data). These two changes along
	with rounding and similar issues result in some minor differences. Therefore,
	our re-implemented evaluation code is mainly used for TensorFlow integration
	(e.g., the support of on-line evaluation in TensorBoard). The users should run
	the corresponding official evaluation code in order to compare with other
	published papers. Note that all the reported numbers in our papers are evaluated
	with the official evaluation code.

	To facilitate the conversion between prediction formats, we also provide
	instructions for running the official evaluation codes on
	[Cityscapes](setup/cityscapes_test_server_evaluation.md) and
	[COCO](setup/coco_test_server_evaluation.md).

	________________________________________________________________________________

	**Q5: What should I do, if I could not manage to compile TensorFlow along with
	the provided efficient merging operation `merge_semantic_and_instance_maps`?**

	A5: In this case, we provide another fallback solution, which implements the
	merging operation with pure tf functions. This fallback solution does not
	require any TensorFlow compilation. However, note that compared to our provided
	TensorFlow merging operation `merge_semantic_and_instance_maps`, its inference
	speed is slower and the resulting segmentation performance may also be slightly
	lower.

	To use the pure-tf-function version of `merge_semantic_and_instance_maps`, set
	`merge_semantic_instance_with_tf_op` to `false` in your config's
	`evaluator_options`.