IQ4_XS seems broken
Hi, I was getting gibberish responses on using the IQ4_XS quant with a recent llama.cpp commit as well as a commit from a month ago (which is when the quants seem to have been released) - any idea whether the quants got messed up, or do I need to use some specific branch instead of llama.cpp master? For reference, the commit I'm using is 87616f068094.
The quant metadata looks ok but I have not tested it. Is this the only quant that you are having the issue with. IQ4_NL and IQ4_XS use non linear values so they can give different results from other quants. Try Q4_0 this should be the most compatible quant.
Thanks! I did some more digging and found that this quant works fine with kobold.cpp's latest release, but not with llama.cpp. I guess there might be some bug with llama.cpp (didn't work with either my build or the vulkan one provided by them) - I'll update here if I figure out something.
Also, just in case - do you remember what commit you used for generating the ggufs?
Thanks! I did some more digging and found that this quant works fine with kobold.cpp's latest release, but not with llama.cpp. I guess there might be some bug with llama.cpp (didn't work with either my build or the vulkan one provided by them) - I'll update here if I figure out something.
Also, just in case - do you remember what commit you used for generating the ggufs?
Thank you for the info and please do update that's really helpful. I don't have the commit version sorry. I will be logging the commit version for future uploads so its easy to trace issues like this.