Question about the differences between the three Breeze 2 GGUF models

#1
by NCGWRjason - opened

Hi there,

Thank you so much for providing the Breeze 2 8B GGUF models — I really appreciate your work!

I'm currently planning to use the Breeze 2 model for a long-text classification task. I noticed that there are three different versions available:

Llama-Breeze2-8B-Instruct-Text-i1-GGUF
Llama-Breeze2-8B-Instruct-Text-GGUF
Llama-Breeze2-8B-Instruct-text-only-GGUF

Could you please help clarify the main differences between these three versions?
Which one would be more suitable for long-text understanding and classification?

Also, I only have a 6GB GPU — would I be able to run the Q4_K_M.gguf quantized version with it?

Thanks again!

Llama-Breeze2-8B-Instruct-Text-i1-GGUF and Llama-Breeze2-8B-Instruct-Text-GGUF are quants of PenutChen/Llama-Breeze2-8B-Instruct-Text while Llama-Breeze2-8B-Instruct-text-only-GGUF are static quants of https://huggingface.co/voidful/Llama-Breeze2-8B-Instruct-text-only. Weighted/imatrix quants offer a much better quality per size so wi0th you being limited to a 6 GB GPU you probably want to go with weighted/imatrix quants. I just queued imatrix quants for Llama-Breeze2-8B-Instruct-text-only so you will soon have them as option to choose from as well. You can check for progress at http://hf.tst.eu/status.html or regularly check the model summary page at https://hf.tst.eu/model# Llama-Breeze2-8B-Instruct-text-only-GGUF for quants to appear. Once done the weighted/imatrix quants will be available under Llama-Breeze2-8B-Instruct-text-only-i1-GGUF.

Thank you very much for your helpful response!

However, I’m still not quite sure which GGUF file would best suit my needs:
https://huggingface.co/mradermacher/Llama-Breeze2-8B-Instruct-Text-i1-GGUF/tree/main
https://huggingface.co/mradermacher/Llama-Breeze2-8B-Instruct-Text-GGUF/tree/main
https://huggingface.co/voidful/Llama-Breeze2-8B-Instruct-text-only/tree/main

Also, you mentioned that the Weighted/imatrix quantized models offer better quality—may I ask how I can tell if a specific GGUF file is a weighted/imatrix quant?
Thank you again for your time and guidance!

you mentioned that the Weighted/imatrix quantized models offer better quality—may I ask how I can tell if a specific GGUF file is a weighted/imatrix quant?

It's booth in the title and in the first sentence of the model card. "i1" in the title means the repository contains wighted/imatrix quants while no "i1" means static quants.

Go for either one of the following:

I would choose the largest quant that fits or even better visit the following URLs and compare get the one with highest quality that fits (you might need to reload the page if it doesn't load):

Thank you so much for your detailed response.

However, the two links you provided don't seem to be from Huggingface, but from another website. Also, they don't appear to be the weighted/imatrix quant models you mentioned.

https://hf.tst.eu/model#Llama-Breeze2-8B-Instruct-Text-GGUF

https://hf.tst.eu/model#Llama-Breeze2-8B-Instruct-text-only-GGUF

Could you explain what's the difference between the "Text" and "Text-only" versions of these models?

Additionally, with my GPU having only 6GB of VRAM, which level of quantized GGUF models would you recommend for local usage?

Thanks again!

However, the two links you provided don't seem to be from Huggingface, but from another website. Also, they don't appear to be the weighted/imatrix quant models you mentioned.

This is the official team mradermacher download page. It is a web application that collects the data from our repositories and the base model and some static data to generate an overview page which much more information that HuggingFace shows you and that automatically concatenate files larger than 50 GB. Most important for you is the "quality" column showing you how good each quant is.

Could you explain what's the difference between the "Text" and "Text-only" versions of these models?

They are completely different models made by different authors. "Text" is by PenutChen and "Text-only" by voidful. voidful is a much more popular and quite experienced model creator so I would trust his work more but in the end its on you to decide which model you like more. Maybe just try booth and choose whatever model you prefer.

Additionally, with my GPU having only 6GB of VRAM, which level of quantized GGUF models would you recommend for local usage?

I would go for either i1-IQ4_XS or i1-Q4_K_M depending on how much context you need.

Sign up or log in to comment