There was a breaking change in llama.cpp's GGUF file format in https://github.com/ggerganov/llama.cpp/pull/6387 and the https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF repo hasn't been updated since. This prevents one to memory-map the model, causing it to take much longer to load than needed when the file is already in the IO cache.

GGUF

Model size

46.7B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support