DPO Fine-Tuned Adapter - LLM Judge Dataset

🧠 Model

  • Base: meta-llama/Llama-3.2-1B-Instruct
  • Fine-tuned using TRL's DPOTrainer with the LLM Judge preference dataset (50 pairs)

βš™οΈ Training Parameters

Parameter Value
Learning Rate 5e-5
Batch Size 4
Epochs 3
Beta (DPO regularizer) 0.1
Max Input Length 1024 tokens
Max Prompt Length 512 tokens
Padding Token eos_token

πŸ“¦ Dataset

  • Source: llm_judge_preferences.csv
  • Size: 50 human-labeled pairs with prompt, chosen, and rejected columns

πŸ“‚ Output

  • Adapter saved and uploaded as Likhith003/dpo-llmjudge-lora-adapter
Downloads last month
16
Safetensors
Model size
1.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support