|
""" |
|
Text content for the GuardBench Leaderboard. |
|
""" |
|
|
|
TITLE = """ |
|
<div style="text-align: center; margin-bottom: 1rem"> |
|
<h1>GuardBench Leaderboard</h1> |
|
</div> |
|
""" |
|
|
|
INTRODUCTION_TEXT = """ |
|
## Introduction |
|
|
|
GuardBench is a comprehensive benchmark for evaluating the safety guardrails of large language models (LLMs). |
|
This leaderboard tracks model performance across various safety categories, including harmful content detection, |
|
jailbreak resistance, and more. |
|
|
|
Models are evaluated on their ability to properly refuse harmful requests and detect problematic content |
|
across multiple categories and test scenarios. |
|
""" |
|
|
|
LLM_BENCHMARKS_TEXT = """ |
|
## GuardBench Evaluation Methodology |
|
|
|
GuardBench evaluates models across multiple safety categories: |
|
|
|
- **Criminal, Violent, and Terrorist Activity** |
|
- **Manipulation, Deception, and Misinformation** |
|
- **Creative Content Involving Illicit Themes** |
|
- **Sexual Content and Violence** |
|
- **Political Corruption and Legal Evasion** |
|
- **Labor Exploitation and Human Trafficking** |
|
- **Environmental and Industrial Harm** |
|
- **Animal Cruelty and Exploitation** |
|
- **Self-Harm and Suicidal Ideation** |
|
- **Safe Prompts** |
|
|
|
Each category is tested with: |
|
- **Default Prompts**: Standard harmful requests |
|
- **Jailbreaked Prompts**: Adversarial attempts to bypass safety measures |
|
- **Default Answers**: Evaluation of harmful content in responses |
|
- **Jailbreaked Answers**: Evaluation of harmful content in responses to jailbreak attempts |
|
|
|
Metrics include: |
|
- **F1 Binary**: Harmonic mean of precision and recall |
|
- **Recall Binary**: Proportion of actual harmful content correctly identified |
|
- **Precision Binary**: Proportion of identified harmful content that was actually harmful |
|
- **Error Ratio**: Rate of errors during evaluation |
|
- **Average Runtime**: Average processing time in milliseconds |
|
""" |
|
|
|
EVALUATION_QUEUE_TEXT = """ |
|
## Submission Process |
|
|
|
To submit your model results to the GuardBench leaderboard: |
|
|
|
1. Evaluate your model using the [GuardBench framework](https://github.com/huggingface/guard-bench) |
|
2. Format your results as a JSONL file according to our schema |
|
3. Submit your results using the submission form with your authorized token |
|
|
|
Results will be processed and added to the leaderboard once validated. |
|
""" |
|
|
|
CITATION_BUTTON_LABEL = "Cite GuardBench" |
|
|
|
CITATION_BUTTON_TEXT = """ |
|
@misc{guardbench2023, |
|
author = {GuardBench Team}, |
|
title = {GuardBench: Comprehensive Benchmark for LLM Safety Guardrails}, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\\url{https://github.com/huggingface/guard-bench}} |
|
} |
|
""" |
|
|