Error loading Llama 3.3 70B FP4 model

#4
by rpeinl - opened

Hi,
I tried to use the inferencing code that you provide on the model card with TensorRT-LLM.
The code is running just fine with TinyLlama 1.1B. However, when I run it with Llama 3.3 70B in your FP4 version on my H100 it throws the following error.

File /opt/conda/lib/python3.10/site-packages/tensorrt_llm/quantization/functional.py:1106, in fp4_gemm(input, input_sf, weight, weight_sf, global_sf, output_dtype, scaling_vector_size)
1104 plug_inputs = [input, input_sf, weight, weight_sf, global_sf]
1105 plug_inputs = [i.trt_tensor for i in plug_inputs]
-> 1106 layer = default_trtnet().add_plugin_v2(plug_inputs, fp4_gemm_plug)
1107 _add_plugin_info(layer, fp4_gemm_plg_creator, "fp4_gemm", pfc)
1108 output = _create_tensor(layer.get_output(0), layer)

TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, inputs: List[tensorrt_bindings.tensorrt.ITensor], plugin: tensorrt_bindings.tensorrt.IPluginV2) -> tensorrt_bindings.tensorrt.IPluginV2Layer

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7fd358624b30>, [<tensorrt_bindings.tensorrt.ITensor object at 0x7fd35861c570>, <tensorrt_bindings.tensorrt.ITensor object at 0x7fd35861cf70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7fd35861deb0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7fd35861d5f0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7fd35861e330>], None

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment