Differences in quantization

by Varkoyote - opened 24 days ago

24 days ago

•

Hello! I wonder what are the differences between these quants please:

It's the first time I see those last two! Thanks.

ZeroWw

Owner 21 days ago

q8 is the normal q8 you find everywhere else.
q8_p is with the PURE flag when converting (all weights are q8)

q8q4 is a Q4 for all weights except the OUTPT and the EMBED weights. giving the model a better balance and a slightly bigger memory footprint.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment