Wolfram Ravenwolf's picture

Wolfram Ravenwolf

wolfram

AI & ML interests

Local LLMs

Recent Activity

View all activity

Organizations

ellamind's profile picture Blog-explorers's profile picture open/ acc's profile picture

wolfram's activity

posted an update about 9 hours ago
view post
Post
368
Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!
  • 2 replies
·
New activity in mlx-community/Qwen3-30B-A3B-4bit about 9 hours ago
New activity in wolfram/Athene-V2-Chat-4.65bpw-h6-exl2 8 days ago

Improve language tag

1
#1 opened 10 days ago by
lbourdois
upvoted an article 3 months ago
view article
Article

Welcome to Inference Providers on the Hub 🔥

478
published an article 4 months ago
view article
Article

🐺🐦‍⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark

By wolfram
5
New activity in blog-explorers/README 4 months ago

[Support] Community Articles

1
83
#5 opened about 1 year ago by
victor
published an article 4 months ago
view article
Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

By wolfram
41
published an article 5 months ago
view article
Article

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

By wolfram
79
published an article 5 months ago
view article
Article

Turning Home Assistant into an AI Powerhouse: Amy's Guide

By wolfram
3