Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceH4 's Collections
Scaling Test-Time Compute with Open Models
Zephyr ORPO
Zephyr 7B
Zephyr 7B Gemma
StarChat2 15B
Journal Club
Papers We've Read
Awesome SFT datasets
Awesome feedback datasets
Awesome reward models

Awesome reward models

updated Apr 12, 2024

A curated collection of reward models to use with techniques like rejection sampling and RLHF / RLAIF

Upvote
7

  • llm-blender/PairRM

    Text Generation • Updated Jan 22, 2024 • 5.53k • 199

  • openbmb/UltraRM-13b

    Updated Oct 14, 2023 • 1.8k • 59

  • OpenAssistant/reward-model-deberta-v3-large-v2

    Text Classification • Updated Feb 1, 2023 • 10.3k • • 221

  • PKU-Alignment/beaver-7b-v1.0-reward

    Reinforcement Learning • Updated Apr 20, 2024 • 2.93k • 16
Upvote
7
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs