UltraRonin commited on
Commit
ac33f17
·
1 Parent(s): 3a8cf08
Files changed (1) hide show
  1. src/about.py +2 -0
src/about.py CHANGED
@@ -43,6 +43,8 @@ TITLE = """<h1 align="center" id="space-title">LR<sup>2</sup>Bench: Evaluating L
43
  # What does your leaderboard evaluate?
44
  INTRODUCTION_TEXT = """
45
  <strong>LR<sup>2</sup>Bench</strong> is a novel benchmark designed to evaluate the <strong>L</strong>ong-chain <strong>R</strong>eflective <strong>R</strong>easoning capabilities of LLMs. LR<sup>2</sup>Bench comprises 850 samples across six Constraint Satisfaction Problems (CSPs) where reflective reasoning is crucial for deriving solutions that meet all given constraints. Each type of task focuses on distinct constraint patterns, such as knowledge-based, logical, and spatial constraints, providing a comprehensive evaluation of diverse problem-solving scenarios.
 
 
46
  """
47
 
48
  TASK_TEXT = {
 
43
  # What does your leaderboard evaluate?
44
  INTRODUCTION_TEXT = """
45
  <strong>LR<sup>2</sup>Bench</strong> is a novel benchmark designed to evaluate the <strong>L</strong>ong-chain <strong>R</strong>eflective <strong>R</strong>easoning capabilities of LLMs. LR<sup>2</sup>Bench comprises 850 samples across six Constraint Satisfaction Problems (CSPs) where reflective reasoning is crucial for deriving solutions that meet all given constraints. Each type of task focuses on distinct constraint patterns, such as knowledge-based, logical, and spatial constraints, providing a comprehensive evaluation of diverse problem-solving scenarios.
46
+
47
+ <strong>Note:</strong> We have released the LR<sup>2</sup>Bench dataset <a href="https://github.com/Ultramarine-spec/LR2Bench">here</a>. For evaluation, you can submit your model's answer here following the submission guidelines. The Leaderboard will automatically evaluate the performance with rule-based matching. If you have further questions, please feel free to contact us at <a href="mailto:[email protected]">[email protected]</a>.
48
  """
49
 
50
  TASK_TEXT = {