Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
RichardForests 's Collections
Language Models
CV
RL
Diffusion models
3D/4D Gaussian Splatting
Multimodal
Mamba
NeRF
Transformers & MoE
(3D) Foundation Models
SSL
DL & Software DStructures
Gemma & MoE
Dora
Flash Attention in Triton
Lora variations
Parameter Efficient - LLMs
Robotics - Cross Attention
LLM Agents OS
DMs - Lighting Conditions

RL

updated May 2, 2024
Upvote
-

  • Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

    Paper • 2311.13231 • Published Nov 22, 2023 • 29

  • Nash Learning from Human Feedback

    Paper • 2312.00886 • Published Dec 1, 2023 • 17

  • Secrets of RLHF in Large Language Models Part II: Reward Modeling

    Paper • 2401.06080 • Published Jan 11, 2024 • 29

  • MusicRL: Aligning Music Generation to Human Preferences

    Paper • 2402.04229 • Published Feb 6, 2024 • 17

  • OpenAssistant/reward-model-deberta-v3-large-v2

    Text Classification • Updated Feb 1, 2023 • 10.3k • • 221

  • Iterative Reasoning Preference Optimization

    Paper • 2404.19733 • Published Apr 30, 2024 • 50
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs