Big thanks to ymcki for updating the llama.cpp code to support the 'dummy' layers. Use the llama.cpp branch from this PR: https://github.com/ggml-org/llama.cpp/pull/12843 if it hasn't been merged yet.

Note the imatrix data used for the IQ quants has been produced from the Q4 quant!

'Make knowledge free for everyone'

Quantized version of: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

Downloads last month: 3,744

GGUF

Model size

253B params

Architecture

deci

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DevQuasar/nvidia.Llama-3_1-Nemotron-Ultra-253B-v1-GGUF

Base model

nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

Quantized

(7)

this model

Collection including DevQuasar/nvidia.Llama-3_1-Nemotron-Ultra-253B-v1-GGUF

Very Large GGUFs

Collection

GGUF quantized versions of very large models - over 100B parameters • 17 items • Updated 9 days ago • 1