Model usage on vLLM fails: `No available memory for the cache blocks` & `Error executing method 'determine_num_available_blocks'`
2
#8 opened 11 days ago
by
surajd

benchmark test use vllm ? input/output=500/2000 ?
1
#6 opened 14 days ago
by
chuanyizjc

FP8 and FP4
#5 opened 15 days ago
by
whatever1983
how to reproduce the benchmark score?
#4 opened 19 days ago
by
lincharliesun
AWQ OR GPTQ Quant
1
1
#2 opened 20 days ago
by
getfit

"ffn_mult": null,
1
9
#1 opened 20 days ago
by
csabakecskemeti
