KeyError: 'general.name'
There is no attribute "general.name" in gguf metadata. Transformers library can't autoload it. Please, fix.
Docker CMD
docker run --gpus all -d \
--name vllm_gemmaQ4_2gpus_8000 \
-v ~/.cache/huggingface/hub:/models \
--env "HUGGING_FACE_HUB_TOKEN=<YOUR_HF_TOKEN>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model /models/Gemma3-27B-it-Q4-QAT-GGUF/gemma-3-27b-it-q4_0.gguf \
--max-model-len 32768 \
--gpu-memory-utilization 0.95 \
--chat-template /models/Gemma3-27B-it-Q4-QAT-GGUF/gemma3_chat_template.jinja \
--tensor-parallel-size 2
Error
INFO 04-07 05:56:24 [__init__.py:256] Automatically detected platform cuda.
INFO 04-07 05:56:26 [api_server.py:977] vLLM API server version 0.8.0
INFO 04-07 05:56:26 [api_server.py:978] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template='/models/Gemma3-27B-it-Q4-QAT-GGUF/gemma3_chat_template.jinja', chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/models/Gemma3-27B-it-Q4-QAT-GGUF/gemma-3-27b-it-q4_0.gguf', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=32768, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1059, in <module>
uvloop.run(run_server(args))
File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1012, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 141, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 161, in build_async_engine_client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1206, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1121, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/config.py", line 333, in __init__
hf_config = get_config(self.hf_config_path or self.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 280, in get_config
config_dict, _ = PretrainedConfig.get_config_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 594, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 685, in _get_config_dict
config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 370, in load_gguf_checkpoint
model_name = read_field(reader, "general.name")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 261, in read_field
value = reader.fields[field]
~~~~~~~~~~~~~^^^^^^^
KeyError: 'general.name'
+1
@Venom2212 btw, you will not be able to run this model with "vllm/vllm-openai:latest" as transformers library (used by vllm) do not support gemma3 architecture for GGUF format currently. See https://github.com/huggingface/transformers/issues/37002
There's also this error in llama.cpp, but I'm not sure if its problematic or not.
load: control-looking token: 106 '' was not control-type; this is probably a bug in the model. its type will be overridden
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
IMHO it looks like a bug(or deliberate action) in this GGUF file. According to GGUF spec (https://github.com/ggml-org/ggml/blob/master/docs/gguf.md) there should be key "general.name". In previous gemma (https://huggingface.co/google/gemma-2-2b-GGUF/tree/main?show_file_info=2b_pt_v2.gguf) it is present. Here is missing
@lkv can you look at this please?
vllm and transformers projects made their adjustments in main branches to support gguf version of gemma3
Download quants from here:
https://huggingface.co/stduhpf
We made sure to include to control token fixes and add the name back.
Furthermore, these quants have the embeddings table quantized at Q6_K, which results in a significant speed boost without any noticeable degradation in performance.