Mungert/Qwen2.5-VL-72B-Instruct-GGUF

10 days ago

•

Hi, I was getting gibberish responses on using the IQ4_XS quant with a recent llama.cpp commit as well as a commit from a month ago (which is when the quants seem to have been released) - any idea whether the quants got messed up, or do I need to use some specific branch instead of llama.cpp master? For reference, the commit I'm using is 87616f068094.

Mungert

Owner 10 days ago

The quant metadata looks ok but I have not tested it. Is this the only quant that you are having the issue with. IQ4_NL and IQ4_XS use non linear values so they can give different results from other quants. Try Q4_0 this should be the most compatible quant.

numiros

9 days ago

•

edited 9 days ago

Thanks! I did some more digging and found that this quant works fine with kobold.cpp's latest release, but not with llama.cpp. I guess there might be some bug with llama.cpp (didn't work with either my build or the vulkan one provided by them) - I'll update here if I figure out something.

Also, just in case - do you remember what commit you used for generating the ggufs?

Mungert

Owner 9 days ago

Thanks! I did some more digging and found that this quant works fine with kobold.cpp's latest release, but not with llama.cpp. I guess there might be some bug with llama.cpp (didn't work with either my build or the vulkan one provided by them) - I'll update here if I figure out something.

Also, just in case - do you remember what commit you used for generating the ggufs?

Thank you for the info and please do update that's really helpful. I don't have the commit version sorry. I will be logging the commit version for future uploads so its easy to trace issues like this.

Mungert
/

Qwen2.5-VL-72B-Instruct-GGUF

IQ4_XS seems broken