NotShrirang commited on
Commit
60d4c28
·
verified ·
1 Parent(s): 04d0920

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -23,7 +23,6 @@ Due to its lightweight architecture, this model can be deployed efficiently on l
23
  3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
24
  4. Optimizer: AdamW with learning rate scheduling.
25
  5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
26
- 6. Hardware: Trained on 8xA100 GPUs with mixed precision training.
27
 
28
  ## Use Cases
29
  1. Assisting developers and analysts in writing SQL queries.
@@ -51,6 +50,43 @@ outputs = model.generate(**inputs, max_new_tokens=100)
51
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
52
  ```
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  - **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang)
55
  - **Language(s) (NLP):** [en]
56
  - **License:** [apache-2.0]
 
23
  3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
24
  4. Optimizer: AdamW with learning rate scheduling.
25
  5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
 
26
 
27
  ## Use Cases
28
  1. Assisting developers and analysts in writing SQL queries.
 
50
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
51
  ```
52
 
53
+ ## Training Details
54
+
55
+ - **Total Steps:** 25,000
56
+ - **Batch Size:** 4
57
+ - **Optimizer:** AdamW
58
+ - **Learning Rate:** 5e-5
59
+
60
+ ### Training and Validation Loss Progression
61
+
62
+ | Step | Training Loss | Validation Loss |
63
+ |-------|--------------|----------------|
64
+ | 1000 | 1.0017 | 1.0256 |
65
+ | 2000 | 1.1644 | 0.8818 |
66
+ | 3000 | 0.7851 | 0.8507 |
67
+ | 4000 | 0.7416 | 0.8322 |
68
+ | 5000 | 0.6960 | 0.8184 |
69
+ | 6000 | 1.0118 | 0.8068 |
70
+ | 7000 | 0.9897 | 0.7997 |
71
+ | 8000 | 0.9165 | 0.7938 |
72
+ | 9000 | 0.8048 | 0.7875 |
73
+ | 10000 | 0.8869 | 0.7822 |
74
+ | 11000 | 0.8387 | 0.7788 |
75
+ | 12000 | 0.8117 | 0.7746 |
76
+ | 13000 | 0.7259 | 0.7719 |
77
+ | 14000 | 0.8100 | 0.7678 |
78
+ | 15000 | 0.6901 | 0.7626 |
79
+ | 16000 | 0.9630 | 0.7600 |
80
+ | 17000 | 0.6599 | 0.7571 |
81
+ | 18000 | 0.6770 | 0.7541 |
82
+ | 19000 | 0.7360 | 0.7509 |
83
+ | 20000 | 0.7170 | 0.7458 |
84
+ | 21000 | 0.7993 | 0.7446 |
85
+ | 22000 | 0.5846 | 0.7412 |
86
+ | 23000 | 0.8269 | 0.7411 |
87
+ | 24000 | 0.5817 | 0.7379 |
88
+ | 25000 | 0.5772 | 0.7357 |
89
+
90
  - **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang)
91
  - **Language(s) (NLP):** [en]
92
  - **License:** [apache-2.0]