Promt time (ollama) on 22c xenon, 5070 ti, 128GB ram. (Q6_K_L)
#12 opened 4 days ago
by
MikeZeroTango
Template bug fixed in llama.cpp
4
5
#11 opened 11 days ago
by
matteogeniaccio
vllm depolyment error
1
#10 opened 12 days ago
by
Saicy
Higher than usual refusal rate with Q6_K_L quant GGUF
3
#9 opened 13 days ago
by
smcleod

Tool use?
2
#8 opened 14 days ago
by
johnpyp
llama.cpp fixes have just been merged
2
21
#5 opened 16 days ago
by
Mushoz
LM Studio: unknown model architecture: 'glm4'?
5
#4 opened 20 days ago
by
DrNicefellow

please regenerate ggufs
3
1
#3 opened 24 days ago
by
jacek2024
Broken results
1
8
#2 opened 24 days ago
by
RamoreRemora
Yarn quantization for long context
1
#1 opened 24 days ago
by
sovetboga