Pan&Scan is desable by default ?
Hi,
it looks like the pan an scan capability in the json config of the image>token is disabled by default. is that an error ?
does it mean that by default the model can't accurately "read" large images ?
thanks in advance for the reply
Hi @Jack-HR-Red , Gemma uses a fixed-resolution (896x896) vision encoder which struggles with non-square or high-resolution images which can cause potential loss of detail. To address this, an adaptive windowing algorithm ("Pan and Scan") is implemented during inference. This algorithm breaks down images into smaller, equal-sized crops, resizes them to 896x896 and then feeds them to the encoder. This windowing is applied only when needed and can be disabled for faster inference, representing a trade-off between speed and accuracy. Please have a look at this Gemma3 technical report for more details on Pan & Scan (P&S)
Hi Renu11
thanks for the clear reply.
If I understand correctly, the pan&scan parameter in the config file is by default disable but the model choose automatically to activate when needed.
is that correct ?
thanks in advance for the reply
Hi @Jack-HR-Red ! A slight correciton to the above reply:
adaptive windowing algorithm ("Pan and Scan") is implemented during
inferencepreprocessing.
Pan-and-scan is actually a function of the Gemma3ImageProcessor
inside the Gemma3Processor
. You have to explicitly enable it with kwargs
when calling preprocessor, as shown in the following code snippet.
processor = Gemma3Processor.from_pretrained(gemma_model_id)
inputs = processor(images=images, text=text, images_kwargs={"do_pan_and_scan": True})
Note that even if you enable Pan-and-Scan, it doesn't mean it will engage. There are a few (configurable) early exit conditions, including an aspect ratio threashold, a minimum crop size, and a maximum number of crops.