DPO Fine-Tuned Adapter - PairRM Dataset

...

DPO Fine-Tuned Adapter - PairRM Dataset

🧠 Model

  • Base: meta-llama/Llama-3.2-1B-Instruct
  • Fine-tuned using TRL's DPOTrainer with the PairRM preference dataset (500 pairs)

βš™οΈ Training Parameters

Parameter Value
Learning Rate 3e-5
Batch Size 4
Epochs 3
Beta (DPO regularizer) 0.1
Max Input Length 1024 tokens
Max Prompt Length 512 tokens
Padding Token eos_token

πŸ“¦ Dataset

  • Source: pairrm_preferences.csv
  • Size: 500 instructions with prompt, chosen, and rejected columns

πŸ“‚ Output

  • Adapter saved and uploaded as Likhith003/dpo-pairrm-lora-adapter
Downloads last month
17
Safetensors
Model size
1.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support