-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 34 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 42 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 8 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 207
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12983
-
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Paper • 2401.00812 • Published • 10 -
DynaSaur: Large Language Agents Beyond Predefined Actions
Paper • 2411.01747 • Published • 34 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 207 -
Executable Code Actions Elicit Better LLM Agents
Paper • 2402.01030 • Published • 133
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 207 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 122 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 229 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper • 2412.03304 • Published • 19
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 103 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20