Re-converting the GGUF for MLA?

#15

by Silver267 - opened 23 days ago

23 days ago

Hi, now that MLA support is officially merged into llama.cpp, is it possible to re-convert the gguf in favour of the new format for MLA to work as described here? Thanks!

Panchovix

21 days ago

•

edited 21 days ago

+1 to this please! 16K cache by default uses ~80GB VRAM, while with MLA it is barely near that (some few GBs instead)

MB7977

18 days ago

+1 Would be a game changer for long context work.

Is there a script available anywhere to dynamically quant this model with the latest changes ourselves if Unsloth have no plans to?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment