Differences in quantization

#1
by Varkoyote - opened

Hello! I wonder what are the differences between these quants please:

  • q8
  • q8_p
  • q8q4

It's the first time I see those last two! Thanks.

q8 is the normal q8 you find everywhere else.
q8_p is with the PURE flag when converting (all weights are q8)

q8q4 is a Q4 for all weights except the OUTPT and the EMBED weights. giving the model a better balance and a slightly bigger memory footprint.

Sign up or log in to comment