Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -11,7 +11,7 @@ HF_TOKEN = os.environ.get("HF_TOKEN", None)
|
|
11 |
|
12 |
DESCRIPTION = '''
|
13 |
<div>
|
14 |
-
<h1 style="text-align: center;">JudgeLRM
|
15 |
<p>This Space demonstrates the <a href="https://huggingface.co/nuojohnchen/JudgeLRM-7B"><b>JudgeLRM</b></a> model, designed to evaluate the quality of two AI assistant responses. JudgeLRM is a family of judgment-oriented LLMs trained using reinforcement learning (RL) with judge-wise, outcome-driven rewards. JudgeLRM models consistently outperform both SFT-tuned and state-of-the-art reasoning models. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1 by 2.79\% in F1 score, particularly excelling in judge tasks requiring deep reasoning.</p>
|
16 |
<p>Enter an instruction and two responses, and the model will think, reason and score them on a scale of 1-10 (higher is better).</p>
|
17 |
<p>You can also select Hugging Face models to automatically generate responses for evaluation.</p>
|
|
|
11 |
|
12 |
DESCRIPTION = '''
|
13 |
<div>
|
14 |
+
<h1 style="text-align: center;">JudgeLRM</h1>
|
15 |
<p>This Space demonstrates the <a href="https://huggingface.co/nuojohnchen/JudgeLRM-7B"><b>JudgeLRM</b></a> model, designed to evaluate the quality of two AI assistant responses. JudgeLRM is a family of judgment-oriented LLMs trained using reinforcement learning (RL) with judge-wise, outcome-driven rewards. JudgeLRM models consistently outperform both SFT-tuned and state-of-the-art reasoning models. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1 by 2.79\% in F1 score, particularly excelling in judge tasks requiring deep reasoning.</p>
|
16 |
<p>Enter an instruction and two responses, and the model will think, reason and score them on a scale of 1-10 (higher is better).</p>
|
17 |
<p>You can also select Hugging Face models to automatically generate responses for evaluation.</p>
|