RLAIF

Enterprise

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sea-snell authored a paper 15 days ago

Learning Adaptive Parallel Reasoning with Language Models

nlile authored a paper 2 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Asap7772 authored a paper 2 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

View all activity

RLAIF's activity

sea-snell

authored a paper 15 days ago

Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published 17 days ago • 42

nlile

authored a paper 2 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Paper • 2502.17387 • Published Feb 24 • 6

Asap7772

authored a paper 2 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 38

nlile

authored a paper 2 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 38

violetxi

authored 2 papers 4 months ago

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Paper • 2407.07086 • Published Jul 9, 2024

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

LouisCastricato

authored a paper 4 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

nlile

authored a paper 4 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

Asap7772

authored a paper 4 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

nlile

authored a paper 7 months ago

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 6

Asap7772

authored a paper 7 months ago

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

Paper • 2410.02725 • Published Oct 3, 2024 • 1

Asap7772

authored 4 papers 9 months ago

sea-snell

authored 3 papers 9 months ago

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Paper • 2311.18232 • Published Nov 30, 2023 • 1

The False Promise of Imitating Proprietary LLMs

Paper • 2305.15717 • Published May 25, 2023 • 5

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63

rmrafailov

authored a paper 10 months ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 20

nlile

authored a paper 10 months ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 20

AI & ML interests

Recent Activity

Team members 10

RLAIF's activity