Update readme to include details on AIDE paper release and METR's evaluation
Browse files
README.md
CHANGED
@@ -6,11 +6,13 @@
|
|
6 |
[](https://discord.gg/Rq7t8wnsuA) 
|
7 |
[](https://twitter.com/WecoAI) 
|
8 |
|
9 |
-
AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task.
|
10 |
|
11 |
AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
|
12 |
|
13 |
-
|
|
|
|
|
14 |
|
15 |
More specifically, AIDE has the following features:
|
16 |
|
@@ -246,3 +248,18 @@ At its core, Solution Space Tree Search consists of three main components:
|
|
246 |
By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
|
247 |
|
248 |

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
[](https://discord.gg/Rq7t8wnsuA) 
|
7 |
[](https://twitter.com/WecoAI) 
|
8 |
|
9 |
+
AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task. This repository implements the AIDE agent described in our paper - [AIDE: AI-Driven Exploration in the Space of Code](https://arxiv.org/pdf/2502.13138). We recommend to check out the [project page](https://www.aide.ml) and [technical report](https://www.weco.ai/blog/technical-report) for a quick summary of the method and results.
|
10 |
|
11 |
AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
|
12 |
|
13 |
+
METR's [RE-Bench](https://arxiv.org/pdf/2411.15114) shows that AIDE is not only capable at machine learning tasks but generalizes to the AI R&D tasks such as optimizing low level Triton kernels and finetuning GPT-2 for QA, even surpassing the performance of human experts.
|
14 |
+
|
15 |
+
In our own benchmark composed of over 60 Kaggle data science competitions, AIDE demonstrated impressive performance, surpassing 50% of Kaggle participants on average.
|
16 |
|
17 |
More specifically, AIDE has the following features:
|
18 |
|
|
|
248 |
By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
|
249 |
|
250 |

|
251 |
+
|
252 |
+
# Citation
|
253 |
+
|
254 |
+
If you use AIDE in your work, please cite the following paper:
|
255 |
+
```bibtex
|
256 |
+
@misc{aide2025,
|
257 |
+
title={AIDE: AI-Driven Exploration in the Space of Code},
|
258 |
+
author={Zhengyao Jiang and Dominik Schmidt and Dhruv Srikanth and Dixing Xu and Ian Kaplan and Deniss Jacenko and Yuxiang Wu},
|
259 |
+
year={2025},
|
260 |
+
eprint={2502.13138},
|
261 |
+
archivePrefix={arXiv},
|
262 |
+
primaryClass={cs.AI},
|
263 |
+
url={https://arxiv.org/abs/2502.13138},
|
264 |
+
}
|
265 |
+
```
|