Very Large GGUFs
Collection
GGUF quantized versions of very large models - over 100B parameters
โข
17 items
โข
Updated
โข
1
Big thanks to ymcki for updating the llama.cpp code to support the 'dummy' layers. Use the llama.cpp branch from this PR: https://github.com/ggml-org/llama.cpp/pull/12843 if it hasn't been merged yet.
Note the imatrix data used for the IQ quants has been produced from the Q4 quant!
'Make knowledge free for everyone'
Quantized version of: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
1-bit
2-bit
3-bit
4-bit
Base model
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1