Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
vwxyzjn 's Collections
Async RLHF Paper Checkpoints
lm-human-preference-details
TL;DR summarization checkpoints
RLOO / PPOv2 TL;DR summarize checkpoints

TL;DR summarization checkpoints

updated Aug 1, 2024

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

Upvote
-

  • cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

    Text Generation • Updated May 15, 2024 • 3.12k

  • cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

    Text Classification • Updated May 15, 2024 • 2.85k

  • cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

    Text Generation • Updated May 15, 2024 • 196

  • cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

    Text Classification • Updated May 15, 2024 • 172

  • cleanrl/EleutherAI_pythia-6.9b-deduped__sft__tldr

    Text Generation • Updated May 15, 2024 • 18

  • cleanrl/EleutherAI_pythia-6.9b-deduped__reward__tldr

    Text Classification • Updated May 7, 2024 • 130

  • cleanrl/EleutherAI_pythia-1b-deduped__ppo__tldr

    Text Generation • Updated May 30, 2024 • 17

  • cleanrl/EleutherAI_pythia-6.9b-deduped__ppo__tldr

    Text Generation • Updated May 30, 2024 • 6

  • cleanrl/EleutherAI_pythia-2.8b-deduped__ppo__tldr

    Text Generation • Updated May 30, 2024 • 9
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs