lewtun HF Staff commited on
Commit
097f572
·
verified ·
1 Parent(s): 00bc147

Add LCB evals

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -26,9 +26,21 @@ OlympicCoder-7B is a code model that achieves strong performance on competitive
26
 
27
  ## Evaluation
28
 
 
 
 
 
 
 
 
 
 
 
29
  ![](./ioi-evals.png)
30
 
 
31
 
 
32
 
33
  ## Usage
34
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
 
26
 
27
  ## Evaluation
28
 
29
+ We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:
30
+
31
+ * **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
32
+ * **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).
33
+
34
+ > [!NOTE]
35
+ > The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.
36
+
37
+ ### IOI'24
38
+
39
  ![](./ioi-evals.png)
40
 
41
+ ### LiveCodeBench
42
 
43
+ ![](./lcb-evals.png)
44
 
45
  ## Usage
46
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers: