Re-converting the GGUF for MLA?
#15
by
Silver267
- opened
+1 to this please! 16K cache by default uses ~80GB VRAM, while with MLA it is barely near that (some few GBs instead)
+1 Would be a game changer for long context work.
Is there a script available anywhere to dynamically quant this model with the latest changes ourselves if Unsloth have no plans to?