DevQuasar

community

Verified

https://devquasar.com/

Activity Feed

AI & ML interests

Open-Source LLMs, Local AI Projects: https://pypi.org/project/llm-predictive-router/

Recent Activity

csabakecskemeti updated a model 9 minutes ago

DevQuasar/cognitivecomputations.Dolphin-Mistral-24B-Venice-Edition-GGUF

csabakecskemeti published a model about 3 hours ago

DevQuasar/cognitivecomputations.Dolphin-Mistral-24B-Venice-Edition-GGUF

csabakecskemeti updated a model about 3 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-14B-GGUF

View all activity

DevQuasar's activity

csabakecskemeti

updated a model 9 minutes ago

DevQuasar/cognitivecomputations.Dolphin-Mistral-24B-Venice-Edition-GGUF

Text Generation • Updated 9 minutes ago

csabakecskemeti

published a model about 3 hours ago

DevQuasar/cognitivecomputations.Dolphin-Mistral-24B-Venice-Edition-GGUF

Text Generation • Updated 9 minutes ago

csabakecskemeti

updated a model about 3 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-14B-GGUF

Text Generation • Updated about 3 hours ago • 25 • 1

csabakecskemeti

updated a model about 4 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-32B-IOI-GGUF

Text Generation • Updated about 4 hours ago • 1

csabakecskemeti

published a model about 9 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-32B-IOI-GGUF

Text Generation • Updated about 4 hours ago • 1

csabakecskemeti

updated a model about 9 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-32B-GGUF

Text Generation • Updated about 9 hours ago • 34 • 1

csabakecskemeti

published a model about 16 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-32B-GGUF

Text Generation • Updated about 9 hours ago • 34 • 1

csabakecskemeti

published a model about 18 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-14B-GGUF

Text Generation • Updated about 3 hours ago • 25 • 1

csabakecskemeti

updated a model about 18 hours ago

DevQuasar/nvidia.OpenCodeReasoning-Nemotron-7B-GGUF

Text Generation • Updated about 18 hours ago • 42

csabakecskemeti

posted an update 30 days ago

Post

2039

Local Llama4 Maverick Q2
https://youtu.be/4F8g_LThli0?si=MGba2SUTHt6xYw3T
Quants uploading now

Big thanks to @ngxson !

csabakecskemeti

posted an update about 1 month ago

Post

1688

Why the 'how many r's in strawberry' prompt "breaks" llama4? :D

Quants DevQuasar/meta-llama.Llama-4-Scout-17B-16E-Instruct-GGUF

3 replies

csabakecskemeti

posted an update about 2 months ago

Post

3362

I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.

csabakecskemeti

posted an update about 2 months ago

Post

2385

Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/

csabakecskemeti

posted an update about 2 months ago

Post

1817

GTC new model announcement now from Nvidia
nvidia/Llama-3_3-Nemotron-Super-49B-v1

GGUFs:
DevQuasar/nvidia.Llama-3_3-Nemotron-Super-49B-v1-GGUF

Enjoy!

csabakecskemeti

posted an update about 2 months ago

Post

575

Cohere Command-a Q2 quant
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF

6.7t/s on a 3gpu setup (4080 + 2x3090)

(q3, q4 currently uploading)

csabakecskemeti

posted an update 2 months ago

Post

829

Fine tuning on the edge. Pushing the MI100 to it's limits.
QWQ-32B 4bit QLORA fine tuning
VRAM usage 31.498G/31.984G :D

4 replies

csabakecskemeti

posted an update 2 months ago

Post

1972

-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

4 replies

csabakecskemeti

posted an update 2 months ago

Post

2799

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

csabakecskemeti

posted an update 3 months ago

Post

1630

I found if we apply the reasoning system prompt (that has been published on the NousResearch/DeepHermes-3-Llama-3-8B-Preview model card) other models are also react to it and start mimicking reasoning. Some better some worse. I've seen internal monologue and self questioning.

Here's a blogpost about it:
http://devquasar.com/ai/reasoning-system-prompt/

csabakecskemeti

posted an update 3 months ago

Post

1876

Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.

4 replies

AI & ML interests

Recent Activity

Team members 2

DevQuasar's activity