dSrikanth commited on
Commit
c38c76f
·
1 Parent(s): 6191534

Update readme to include details on AIDE paper release and METR's evaluation

Browse files
Files changed (1) hide show
  1. README.md +19 -2
README.md CHANGED
@@ -6,11 +6,13 @@
6
  [![Discord](https://dcbadge.vercel.app/api/server/Rq7t8wnsuA?compact=true&style=flat)](https://discord.gg/Rq7t8wnsuA) 
7
  [![Twitter Follow](https://img.shields.io/twitter/follow/WecoAI?style=social)](https://twitter.com/WecoAI) 
8
 
9
- AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task.
10
 
11
  AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
12
 
13
- In our own benchmark composed of over 60 Kaggle data science competitions, AIDE demonstrated impressive performance, surpassing 50% of Kaggle participants on average (see our [technical report](https://www.weco.ai/blog/technical-report) for details).
 
 
14
 
15
  More specifically, AIDE has the following features:
16
 
@@ -246,3 +248,18 @@ At its core, Solution Space Tree Search consists of three main components:
246
  By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
247
 
248
  ![Tree Search Visualization](https://github.com/WecoAI/aideml/assets/8918572/2401529c-b97e-4029-aed2-c3f376f54c3c)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  [![Discord](https://dcbadge.vercel.app/api/server/Rq7t8wnsuA?compact=true&style=flat)](https://discord.gg/Rq7t8wnsuA) 
7
  [![Twitter Follow](https://img.shields.io/twitter/follow/WecoAI?style=social)](https://twitter.com/WecoAI) 
8
 
9
+ AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task. This repository implements the AIDE agent described in our paper - [AIDE: AI-Driven Exploration in the Space of Code](https://arxiv.org/pdf/2502.13138). We recommend to check out the [project page](https://www.aide.ml) and [technical report](https://www.weco.ai/blog/technical-report) for a quick summary of the method and results.
10
 
11
  AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
12
 
13
+ METR's [RE-Bench](https://arxiv.org/pdf/2411.15114) shows that AIDE is not only capable at machine learning tasks but generalizes to the AI R&D tasks such as optimizing low level Triton kernels and finetuning GPT-2 for QA, even surpassing the performance of human experts.
14
+
15
+ In our own benchmark composed of over 60 Kaggle data science competitions, AIDE demonstrated impressive performance, surpassing 50% of Kaggle participants on average.
16
 
17
  More specifically, AIDE has the following features:
18
 
 
248
  By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
249
 
250
  ![Tree Search Visualization](https://github.com/WecoAI/aideml/assets/8918572/2401529c-b97e-4029-aed2-c3f376f54c3c)
251
+
252
+ # Citation
253
+
254
+ If you use AIDE in your work, please cite the following paper:
255
+ ```bibtex
256
+ @misc{aide2025,
257
+ title={AIDE: AI-Driven Exploration in the Space of Code},
258
+ author={Zhengyao Jiang and Dominik Schmidt and Dhruv Srikanth and Dixing Xu and Ian Kaplan and Deniss Jacenko and Yuxiang Wu},
259
+ year={2025},
260
+ eprint={2502.13138},
261
+ archivePrefix={arXiv},
262
+ primaryClass={cs.AI},
263
+ url={https://arxiv.org/abs/2502.13138},
264
+ }
265
+ ```