Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
trl-lib 's Collections
Preference datasets
Stepwise supervision datasets
Prompt-completion datasets
Prompt-only datasets
Unpaired preference datasets
Comparing DPO with IPO and KTO
Online-DPO

Online-DPO

updated Jan 8
Upvote
1

  • trl-lib/pythia-1b-deduped-tldr-online-dpo

    Updated Aug 2, 2024 • 4

  • trl-lib/pythia-1b-deduped-tldr-sft

    Updated Aug 2, 2024 • 465

  • trl-lib/pythia-6.9b-deduped-tldr-online-dpo

    Updated Aug 2, 2024 • 1

  • trl-lib/pythia-2.8b-deduped-tldr-sft

    Updated Aug 2, 2024 • 1

  • trl-lib/pythia-2.8b-deduped-tldr-rm

    Updated Aug 2, 2024 • 1

  • trl-lib/pythia-6.9b-deduped-tldr-sft

    Updated Aug 2, 2024 • 3

  • trl-lib/pythia-6.9b-deduped-tldr-rm

    Updated Aug 2, 2024 • 2

  • trl-lib/pythia-1b-deduped-tldr-offline-dpo

    Text Generation • Updated Aug 2, 2024 • 16

  • trl-lib/pythia-2.8b-deduped-tldr-offline-dpo

    Text Generation • Updated Aug 2, 2024 • 2

  • trl-lib/pythia-6.9b-deduped-tldr-offline-dpo

    Text Generation • Updated Aug 2, 2024 • 14

  • trl-lib/pythia-2.8b-deduped-tldr-online-dpo

    Text Generation • Updated Aug 2, 2024 • 14
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs