Re-converting the GGUF for MLA?

#15
by Silver267 - opened

Hi, now that MLA support is officially merged into llama.cpp, is it possible to re-convert the gguf in favour of the new format for MLA to work as described here? Thanks!

+1 to this please! 16K cache by default uses ~80GB VRAM, while with MLA it is barely near that (some few GBs instead)

+1 Would be a game changer for long context work.

Is there a script available anywhere to dynamically quant this model with the latest changes ourselves if Unsloth have no plans to?

Sign up or log in to comment