train_2025-04-16-10-30-19

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the myself dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2580
  • Num Input Tokens Seen: 9320992

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.179 0.1 100 1.3332 305024
1.3519 0.2 200 1.3045 617056
1.467 0.3 300 1.2898 921984
1.1378 0.4 400 1.2818 1245472
1.4098 0.5 500 1.2750 1558144
1.3778 0.6 600 1.2697 1876960
1.4057 0.7 700 1.2629 2190688
1.287 0.8 800 1.2598 2492320
1.2165 0.9 900 1.2564 2815776
1.5059 1.0 1000 1.2533 3109056
1.1756 1.1 1100 1.2536 3419200
1.2237 1.2 1200 1.2521 3726240
1.1701 1.3 1300 1.2534 4038528
1.174 1.4 1400 1.2493 4347040
1.1501 1.5 1500 1.2498 4665984
1.1347 1.6 1600 1.2494 4976160
0.8999 1.7 1700 1.2477 5293696
1.4316 1.8 1800 1.2469 5591584
1.1816 1.9 1900 1.2453 5893024
1.1452 2.0 2000 1.2457 6218112
1.188 2.1 2100 1.2524 6524960
0.9034 2.2 2200 1.2543 6838336
1.0492 2.3 2300 1.2568 7135936
1.0242 2.4 2400 1.2573 7466304
1.1581 2.5 2500 1.2579 7777664
1.0277 2.6 2600 1.2579 8072640
0.9394 2.7 2700 1.2578 8397248
1.0966 2.8 2800 1.2580 8705184
0.926 2.9 2900 1.2578 9003168
0.9383 3.0 3000 1.2580 9320992

Framework versions

  • PEFT 0.14.0
  • Transformers 4.51.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ls155981/my-finetuned-model-trian1

Adapter
(26)
this model