Add LCB evals
Browse files
README.md
CHANGED
@@ -26,9 +26,21 @@ OlympicCoder-7B is a code model that achieves strong performance on competitive
|
|
26 |
|
27 |
## Evaluation
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |

|
30 |
|
|
|
31 |
|
|
|
32 |
|
33 |
## Usage
|
34 |
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
|
|
26 |
|
27 |
## Evaluation
|
28 |
|
29 |
+
We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:
|
30 |
+
|
31 |
+
* **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
|
32 |
+
* **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).
|
33 |
+
|
34 |
+
> [!NOTE]
|
35 |
+
> The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.
|
36 |
+
|
37 |
+
### IOI'24
|
38 |
+
|
39 |

|
40 |
|
41 |
+
### LiveCodeBench
|
42 |
|
43 |
+

|
44 |
|
45 |
## Usage
|
46 |
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|