NotShrirang
/

DeepSeek-R1-Distill-Qwen-1.5B-SQL-Coder-PEFT

Text Generation

text-generation-inference

Model card Files Files and versions Community

NotShrirang commited on Feb 23

Commit

60d4c28

·

verified ·

1 Parent(s): 04d0920

Update README.md

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -23,7 +23,6 @@ Due to its lightweight architecture, this model can be deployed efficiently on l
 3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
 4. Optimizer: AdamW with learning rate scheduling.
 5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
-6. Hardware: Trained on 8xA100 GPUs with mixed precision training.
 ## Use Cases
 1. Assisting developers and analysts in writing SQL queries.
@@ -51,6 +50,43 @@ outputs = model.generate(**inputs, max_new_tokens=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 - **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang)
 - **Language(s) (NLP):** [en]
 - **License:** [apache-2.0]

 3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
 4. Optimizer: AdamW with learning rate scheduling.
 5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
 ## Use Cases
 1. Assisting developers and analysts in writing SQL queries.
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+## Training Details
+- **Total Steps:** 25,000
+- **Batch Size:** 4
+- **Optimizer:** AdamW
+- **Learning Rate:** 5e-5
+### Training and Validation Loss Progression
+| Step  | Training Loss | Validation Loss |
+|-------|--------------|----------------|
+| 1000  | 1.0017       | 1.0256         |
+| 2000  | 1.1644       | 0.8818         |
+| 3000  | 0.7851       | 0.8507         |
+| 4000  | 0.7416       | 0.8322         |
+| 5000  | 0.6960       | 0.8184         |
+| 6000  | 1.0118       | 0.8068         |
+| 7000  | 0.9897       | 0.7997         |
+| 8000  | 0.9165       | 0.7938         |
+| 9000  | 0.8048       | 0.7875         |
+| 10000 | 0.8869       | 0.7822         |
+| 11000 | 0.8387       | 0.7788         |
+| 12000 | 0.8117       | 0.7746         |
+| 13000 | 0.7259       | 0.7719         |
+| 14000 | 0.8100       | 0.7678         |
+| 15000 | 0.6901       | 0.7626         |
+| 16000 | 0.9630       | 0.7600         |
+| 17000 | 0.6599       | 0.7571         |
+| 18000 | 0.6770       | 0.7541         |
+| 19000 | 0.7360       | 0.7509         |
+| 20000 | 0.7170       | 0.7458         |
+| 21000 | 0.7993       | 0.7446         |
+| 22000 | 0.5846       | 0.7412         |
+| 23000 | 0.8269       | 0.7411         |
+| 24000 | 0.5817       | 0.7379         |
+| 25000 | 0.5772       | 0.7357         |
 - **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang)
 - **Language(s) (NLP):** [en]
 - **License:** [apache-2.0]