Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.

Downloads last month
8
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for chargoddard/servile-harpsichord-cdpo

Merges
1 model
Quantizations
1 model

Datasets used to train chargoddard/servile-harpsichord-cdpo

Spaces using chargoddard/servile-harpsichord-cdpo 6