TL;DR summarization checkpoints

vwxyzjn 's Collections

RLOO / PPOv2 TL;DR summarize checkpoints

updated Aug 1, 2024

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

Upvote

cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 3.12k
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 2.85k
cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 196
cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 172
cleanrl/EleutherAI_pythia-6.9b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 18
cleanrl/EleutherAI_pythia-6.9b-deduped__reward__tldr

Text Classification • Updated May 7, 2024 • 130
cleanrl/EleutherAI_pythia-1b-deduped__ppo__tldr

Text Generation • Updated May 30, 2024 • 17
cleanrl/EleutherAI_pythia-6.9b-deduped__ppo__tldr

Text Generation • Updated May 30, 2024 • 6
cleanrl/EleutherAI_pythia-2.8b-deduped__ppo__tldr

Text Generation • Updated May 30, 2024 • 9

Upvote

TL;DR summarization checkpoints

cleanrl/EleutherAI_pythia-1b-dedupedsfttldr

cleanrl/EleutherAI_pythia-1b-dedupedrewardtldr

cleanrl/EleutherAI_pythia-2.8b-dedupedsfttldr

cleanrl/EleutherAI_pythia-2.8b-dedupedrewardtldr

cleanrl/EleutherAI_pythia-6.9b-dedupedsfttldr

cleanrl/EleutherAI_pythia-6.9b-dedupedrewardtldr

cleanrl/EleutherAI_pythia-1b-dedupedppotldr

cleanrl/EleutherAI_pythia-6.9b-dedupedppotldr

cleanrl/EleutherAI_pythia-2.8b-dedupedppotldr