Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
xinlai 's Collections
Step-DPO

Step-DPO

updated Jul 1, 2024

Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Upvote
5

  • xinlai/DeepSeekMath-RL-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 12 • 2

  • xinlai/Qwen2-7B-Instruct-Step-DPO

    Text Generation • Updated Jun 29, 2024 • 11 • 2

  • xinlai/Qwen2-72B-Instruct-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 15

  • xinlai/DeepSeekMath-Base-SFT-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 3

  • xinlai/Qwen2-7B-SFT-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 4

  • xinlai/Qwen1.5-32B-SFT-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 4 • 1

  • xinlai/Qwen2-57B-A14B-SFT-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 5 • 1

  • xinlai/Llama-3-70B-SFT-Step-DPO

    Text Generation • Updated Jun 28, 2024 • 9

  • xinlai/Qwen2-72B-SFT-Step-DPO

    Text Generation • Updated Jun 25, 2024 • 5 • 1

  • xinlai/Math-Step-DPO-10K

    Viewer • Updated Jul 4, 2024 • 10.8k • 633 • 52

  • Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    Paper • 2406.18629 • Published Jun 26, 2024 • 43
Upvote
5
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs