Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,17 @@ base_model:
|
|
12 |
---
|
13 |
|
14 |
# DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation
|
15 |
-
This model is a fine-tuned version of DeepSeek
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Use Cases
|
18 |
1. Assisting developers and analysts in writing SQL queries.
|
|
|
12 |
---
|
13 |
|
14 |
# DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation
|
15 |
+
This model is a fine-tuned version of DeepSeek R1 Distill Qwen 1.5B, specifically optimized for SQL query generation. It has been trained on the GretelAI Synthetic Text-to-SQL dataset to enhance its ability to convert natural language prompts into accurate SQL queries.
|
16 |
+
|
17 |
+
Due to its lightweight architecture, this model can be deployed efficiently on local machines without requiring a GPU, making it ideal for on-premises inference in resource-constrained environments. It offers a balance between performance and efficiency, making it suitable for businesses and developers looking for a cost-effective SQL generation solution.
|
18 |
+
|
19 |
+
## Training Methodology
|
20 |
+
1. Fine-tuning approach: LoRA (Low-Rank Adaptation) for efficient parameter tuning.
|
21 |
+
2. Precision: bfloat16 (bf16) to reduce memory consumption while maintaining numerical stability.
|
22 |
+
3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
|
23 |
+
4. Optimizer: AdamW with learning rate scheduling.
|
24 |
+
5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
|
25 |
+
6. Hardware: Trained on 8xA100 GPUs with mixed precision training.
|
26 |
|
27 |
## Use Cases
|
28 |
1. Assisting developers and analysts in writing SQL queries.
|