Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
sciarrilli
/
Llama-3.2-3B-DPO
like
0
Transformers
Safetensors
trl-lib/ultrafeedback_binarized
Generated from Trainer
trl
dpo
arxiv:
2305.18290
Model card
Files
Files and versions
xet
Community
Train
Deploy
Use this model
main
Llama-3.2-3B-DPO
Commit History
End of training
5cbfce9
verified
sciarrilli
commited on
Mar 18
Model save
edcf956
verified
sciarrilli
commited on
Mar 18
Training in progress, step 116
4a58d24
verified
sciarrilli
commited on
Mar 18
Model save
356ff86
verified
sciarrilli
commited on
Mar 18
initial commit
cea903b
verified
sciarrilli
commited on
Mar 18