Likhith003's picture
Update README.md
dda2169 verified
metadata
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - llama
  - dpo
  - preference-optimization
  - PEFT
  - instruction-tuning
pipeline_tag: text-generation

DPO Fine-Tuned Adapter - LLM Judge Dataset

🧠 Model

  • Base: meta-llama/Llama-3.2-1B-Instruct
  • Fine-tuned using TRL's DPOTrainer with the LLM Judge preference dataset (50 pairs)

βš™οΈ Training Parameters

Parameter Value
Learning Rate 5e-5
Batch Size 4
Epochs 3
Beta (DPO regularizer) 0.1
Max Input Length 1024 tokens
Max Prompt Length 512 tokens
Padding Token eos_token

πŸ“¦ Dataset

  • Source: llm_judge_preferences.csv
  • Size: 50 human-labeled pairs with prompt, chosen, and rejected columns

πŸ“‚ Output

  • Adapter saved and uploaded as Likhith003/dpo-llmjudge-lora-adapter