File size: 1,037 Bytes
b55d067 345d3dd b55d067 345d3dd 738edf8 345d3dd b55d067 345d3dd d4d998a 345d3dd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
title: CircleGuardBench
emoji: ⚪
colorFrom: gray
colorTo: indigo
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: true
short_description: First benchmark testing LLM guards on safety and accuracy.
models:
- AtlaAI/Selene-1-Mini-Llama-3.1-8B
- google/gemma-3-12b-it
- google/gemma-3-4b-it
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-4-Maverick-17B-128E-Instruct
- meta-llama/Llama-4-Scout-17B-16E-Instruct
- meta-llama/Llama-Guard-3-1B
- meta-llama/Llama-Guard-3-8B
- meta-llama/Llama-Guard-4-12B
- mistralai/Ministral-8B-Instruct-2410
- mistralai/Mistral-Small-3.1-24B-Instruct-2503
- Qwen/Qwen2.5-7B-Instruct
- Qwen/Qwen3-0.6B
- Qwen/Qwen3-1.7B
- Qwen/Qwen3-4B
- Qwen/Qwen3-8B
---
# CircleGuardBench Leaderboard
First-of-its-kind benchmark for evaluating the protection capabilities of large language model (LLM) guard systems. It tests how well guard models block harmful content, resist jailbreaks, avoid false positives, and operate efficiently in real-time environments. |