The checkpoint you are trying to load has model type `gemma3` but Transformers does not recognize this architecture
Trying to deploy google/gemma-3-4b-it on vllm sever on azure kubernates and getting below error .. using vllm latest image.
ValueError: The checkpoint you are trying to load has model type gemma3
but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers
. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git
Gemma 3 requires transformers version 4.45.0.dev. Please install this specific version using the provided command - pip install git+https://github.com/huggingface/[email protected]
and try again. Let us know if the issue still persists. Thank you.
@Renu11 how do i run this command, i am deploying vai kubernates yaml even i tried adding it as part of init container, still same error. model is available on PVC .
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-gemma-3-27b-it
namespace: ncgr-gemma-3-27b-it-ns
spec:
replicas: 1
selector:
matchLabels:
app: vllm-server
model: gemma-3-27b-it
template:
metadata:
labels:
app: vllm-server
model: gemma-3-27b-it
spec:
initContainers:
- name: install-transformers
image: python:3.9-slim
command:
- "/bin/bash"
- "-c"
- |
apt-get update && apt-get install -y git && pip install git+https://github.com/huggingface/transformers
containers:
- name: vllm-server
image: vllm/vllm-openai:latest
ports:
- containerPort: 8000
volumeMounts:
- name: model-storage
mountPath: /mnt/models
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
args:
- "--model"
- "/mnt/models/gemma-3-27b-it"
- "--api-key"
- "$(VLLM_API_KEY)"
- "--tensor-parallel-size"
- "1"
- "--dtype"
- "float16"
- "--port"
- "8000"
- "--max-model-len"
- "128000"
- "--max-num-batched-tokens"
- "128000"
- "--max-num-seqs"
- "16"
- "--gpu-memory-utilization"
- "0.98"
- "--served-model-name"
- "Gemma-3-27B-IT"
- "--disable-log-requests"
- "--enable-chunked-prefill"
- "--enable-prefix-caching"
resources:
requests:
cpu: "16"
memory: "100Gi"
nvidia.com/gpu: "1"
limits:
cpu: "24"
memory: "160Gi"
nvidia.com/gpu: "1"
env:
- name: NCCL_DEBUG
value: "INFO"
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: api-key-secret
key: api-key
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: gemma-3-27b-it-pvc
Hi
@bajajrahul03
, A latest stable transformers version is available now which is compatible to Gemma3 models. You can install it using the command - pip install -U transformers
.
ok will try and let u know