Differences in quantization
#1
by
Varkoyote
- opened
Hello! I wonder what are the differences between these quants please:
- q8
- q8_p
- q8q4
It's the first time I see those last two! Thanks.
q8 is the normal q8 you find everywhere else.
q8_p is with the PURE flag when converting (all weights are q8)
q8q4 is a Q4 for all weights except the OUTPT and the EMBED weights. giving the model a better balance and a slightly bigger memory footprint.