Pan&Scan is desable by default ?

#26

by Jack-HR-Red - opened Mar 22

Mar 22

Hi,

it looks like the pan an scan capability in the json config of the image>token is disabled by default. is that an error ?

does it mean that by default the model can't accurately "read" large images ?

thanks in advance for the reply

Renu11

Google org Mar 24

Hi @Jack-HR-Red , Gemma uses a fixed-resolution (896x896) vision encoder which struggles with non-square or high-resolution images which can cause potential loss of detail. To address this, an adaptive windowing algorithm ("Pan and Scan") is implemented during inference. This algorithm breaks down images into smaller, equal-sized crops, resizes them to 896x896 and then feeds them to the encoder. This windowing is applied only when needed and can be disabled for faster inference, representing a trade-off between speed and accuracy. Please have a look at this Gemma3 technical report for more details on Pan & Scan (P&S)

Jack-HR-Red

Mar 24

Hi Renu11
thanks for the clear reply.
If I understand correctly, the pan&scan parameter in the config file is by default disable but the model choose automatically to activate when needed.
is that correct ?

thanks in advance for the reply

RyanMullins

Google org 20 days ago

Hi @Jack-HR-Red ! A slight correciton to the above reply:

adaptive windowing algorithm ("Pan and Scan") is implemented during ~~inference~~ preprocessing.

Pan-and-scan is actually a function of the Gemma3ImageProcessor inside the Gemma3Processor. You have to explicitly enable it with kwargs when calling preprocessor, as shown in the following code snippet.

processor = Gemma3Processor.from_pretrained(gemma_model_id)
inputs = processor(images=images, text=text,  images_kwargs={"do_pan_and_scan": True})

Note that even if you enable Pan-and-Scan, it doesn't mean it will engage. There are a few (configurable) early exit conditions, including an aspect ratio threashold, a minimum crop size, and a maximum number of crops.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment