Wolfram Ravenwolf's picture

Wolfram Ravenwolf

wolfram

·

https://ko-fi.com/wolframravenwolf

AI & ML interests

Local LLMs

Recent Activity

posted an update about 16 hours ago

Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science). A few take-aways stood out - especially for those interested in local deployment and performance trade-offs: 1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s. 2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend. 3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s. 4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings. **Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default. Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!

new activity about 16 hours ago

mlx-community/Qwen3-30B-A3B-4bit:Jinja chat template error on lmstudio

new activity 8 days ago

wolfram/Athene-V2-Chat-4.65bpw-h6-exl2:Improve language tag

View all activity

Organizations

wolfram's activity

New activity in mlx-community/Qwen3-30B-A3B-4bit about 16 hours ago

Jinja chat template error on lmstudio

#1 opened 9 days ago by

New activity in wolfram/Athene-V2-Chat-4.65bpw-h6-exl2 8 days ago

Improve language tag

#1 opened 10 days ago by

New activity in blog-explorers/README 4 months ago

[Support] Community Articles

#5 opened about 1 year ago by

New activity in turboderp/Llama-3.1-70B-Instruct-exl2 9 months ago

The tokenizer has changed just fyi

#2 opened 10 months ago by

New activity in mistralai/Mistral-7B-Instruct-v0.3 12 months ago

no system message?

#14 opened 12 months ago by

New activity in nvidia/Llama3-ChatQA-1.5-70B about 1 year ago

Concerns regarding Prompt Format

#1 opened about 1 year ago by

New activity in wolfram/miquliz-120b-v2.0 about 1 year ago

Strange observation: model becomes super horny in ST's MinP mode

#7 opened about 1 year ago by deleted

New activity in wolfram/miquliz-120b-v2.0-GGUF about 1 year ago

Upload folder using huggingface_hub

#3 opened about 1 year ago by

New activity in wolfram/miquliz-120b-v2.0 about 1 year ago

VRAM Estimates

#3 opened about 1 year ago by

Merge method

#4 opened about 1 year ago by

New activity in wolfram/miqu-1-103b about 1 year ago

Can't wait to test

#4 opened about 1 year ago by

Kindly asking for quants

#2 opened about 1 year ago by

New activity in wolfram/miquliz-120b-v2.0 about 1 year ago

GPTQ / AWQ

#2 opened about 1 year ago by

New activity in wolfram/miqu-1-120b about 1 year ago

Guidance on GPU VRAM Split?

#3 opened over 1 year ago by

New activity in wolfram/miqu-1-103b about 1 year ago

Upload folder using huggingface_hub

#1 opened about 1 year ago by

New activity in wolfram/miquliz-120b-v2.0-3.0bpw-h6-exl2 about 1 year ago

Very interesting that miqu will give 16k context work even only first layer and last layer

#2 opened about 1 year ago by

New activity in wolfram/miquliz-120b-v2.0-GGUF about 1 year ago

iMatrix, IQ2_XS & IQ2_XXS

#2 opened about 1 year ago by

New activity in wolfram/miqu-1-120b about 1 year ago

Performance

#2 opened over 1 year ago by

KnutJaegersberg

New activity in wolfram/miquliz-120b-v2.0-3.0bpw-h6-exl2 about 1 year ago

VRAM requirements

#1 opened about 1 year ago by

sophosympatheia

New activity in wolfram/miquliz-120b about 1 year ago

benchmarks?

#1 opened over 1 year ago by