all_tasks_combined_8b_sft

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the identity and the data_mc_filtered datasets. It achieves the following results on the evaluation set:

Loss: 0.4943

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.4639	0.0929	50	0.5398
0.4939	0.1857	100	0.5122
0.4822	0.2786	150	0.5242
0.4701	0.3714	200	0.5521
0.4216	0.4643	250	0.5374
0.4159	0.5571	300	0.5146
0.4502	0.6500	350	0.5022
0.4625	0.7428	400	0.4985
0.4313	0.8357	450	0.4716
0.4472	0.9285	500	0.4771
0.2753	1.0204	550	0.5026
0.2877	1.1133	600	0.4784
0.3038	1.2061	650	0.4795
0.2944	1.2990	700	0.4682
0.2722	1.3918	750	0.4681
0.2734	1.4847	800	0.4480
0.2826	1.5775	850	0.4484
0.2344	1.6704	900	0.4388
0.2437	1.7632	950	0.4272
0.2113	1.8561	1000	0.4233
0.2548	1.9489	1050	0.4117
0.1126	2.0409	1100	0.5031
0.1128	2.1337	1150	0.4821
0.0993	2.2266	1200	0.4997
0.0978	2.3194	1250	0.4896
0.1056	2.4123	1300	0.4980
0.0897	2.5051	1350	0.4883
0.0872	2.5980	1400	0.4941
0.0916	2.6908	1450	0.4939
0.0844	2.7837	1500	0.4945
0.0959	2.8765	1550	0.4943
0.094	2.9694	1600	0.4941

Framework versions

Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

hlillemark
/

all_tasks_combined_8b_sft

all_tasks_combined_8b_sft

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hlillemark/all_tasks_combined_8b_sft

Evaluation results