4 4 1

Ed Addario PRO

eaddario

EAddario

AI & ML interests

None yet

Recent Activity

updated a model about 18 hours ago

eaddario/Dolphin3.0-Mistral-24B-GGUF

reacted to RiverZ's post with 🤗 about 19 hours ago

🔥 We're thrilled to share some exciting news about ICEdit! Currently, ICEdit app (https://huggingface.co/spaces/RiverZ/ICEdit) has soared to the second place on the weekly trend list of Hugging Face Space, just trailing behind Qwen3. What's more, it also holds the second position on the overall space trend list. This achievement wouldn't have been possible without your incredible support and love. A huge thank you to each and every one of you❤! 🎉 The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources. 🎇 If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers. Github Repo: https://github.com/River-Zhang/ICEdit Hugging Face Space: https://huggingface.co/spaces/RiverZ/ICEdit

new activity 5 days ago

eaddario/Watt-Tool-8B-GGUF:🚩 Report: Illegal or restricted content

View all activity

Organizations

None yet

Posts 7

Post

2222

Until recently, watt-ai/watt-tool-70B was the best performing model in the Berkeley Function-Calling Leaderboard (https://gorilla.cs.berkeley.edu/leaderboard.html), which evaluates LLM's ability to call functions (tools) accurately. The top spot now belongs to Salesforce/Llama-xLAM-2-70b-fc-r and by a quite wide margin!

Layer-wise quantized versions for both models are available at eaddario/Llama-xLAM-2-8b-fc-r-GGUF and eaddario/Watt-Tool-8B-GGUF

Post

1501

Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp!

As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s

The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns.

For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k"

In the next few days/weeks I'll update the models in my HF repo (and will add some others) but eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed.

For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL!

I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background:

- Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511
- TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741
- Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718

View all Posts

models 10

Ed Addario PRO

AI & ML interests

Recent Activity

Organizations

Posts 7

models 10

eaddario/Dolphin3.0-Mistral-24B-GGUF

eaddario/OLMo-2-1124-7B-Instruct-GGUF

eaddario/Qwen3-8B-GGUF

eaddario/Llama-xLAM-2-8b-fc-r-GGUF

eaddario/Llama-Guard-3-8B-GGUF

eaddario/Watt-Tool-8B-GGUF

eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF

eaddario/Hammer2.1-7b-GGUF

eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF

eaddario/Dolphin3.0-R1-Mistral-24B-GGUF

datasets 1

eaddario/imatrix-calibration

Ed Addario PRO

AI & ML interests

Recent Activity

Organizations

Posts 7

models 10 Sort: Recently updated

datasets 1

models 10