|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
- peft |
|
- sql-generation |
|
- text-generation-inference |
|
license: apache-2.0 |
|
datasets: |
|
- gretelai/synthetic_text_to_sql |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation |
|
This model is a fine-tuned version of DeepSeek R1 Distill Qwen 1.5B, specifically optimized for SQL query generation. It has been trained on the GretelAI Synthetic Text-to-SQL dataset to enhance its ability to convert natural language prompts into accurate SQL queries. |
|
|
|
Due to its lightweight architecture, this model can be deployed efficiently on local machines without requiring a GPU, making it ideal for on-premises inference in resource-constrained environments. It offers a balance between performance and efficiency, making it suitable for businesses and developers looking for a cost-effective SQL generation solution. |
|
|
|
## Training Methodology |
|
1. Fine-tuning approach: LoRA (Low-Rank Adaptation) for efficient parameter tuning. |
|
2. Precision: bfloat16 (bf16) to reduce memory consumption while maintaining numerical stability. |
|
3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits. |
|
4. Optimizer: AdamW with learning rate scheduling. |
|
5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.) |
|
|
|
## Use Cases |
|
1. Assisting developers and analysts in writing SQL queries. |
|
2. Automating SQL query generation from user prompts in chatbots. |
|
3. Enhancing SQL-based retrieval-augmented generation (RAG) systems. |
|
|
|
## Limitations & Considerations |
|
1. The model may generate incorrect or suboptimal SQL queries for complex database schemas. |
|
2. It does not perform schema reasoning and requires clear table/column references in the input. |
|
3. Further fine-tuning on domain-specific SQL data may be required for better accuracy. |
|
|
|
## How to Use |
|
You can load the model using 🤗 Transformers: |
|
|
|
```python |
|
from peft import AutoPeftModelForCausalLM |
|
from transformers import AutoTokenizer |
|
import torch |
|
|
|
model = AutoPeftModelForCausalLM.from_pretrained("NotShrirang/DeepSeek-R1-Distill-Qwen-1.5B-SQL-Coder-PEFT") |
|
tokenizer = AutoTokenizer.from_pretrained("NotShrirang/DeepSeek-R1-Distill-Qwen-1.5B-SQL-Coder-PEFT") |
|
|
|
prompt = "Write a SQL query to get the total revenue from the sales table." |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Training Details |
|
|
|
- **Total Steps:** 25,000 |
|
- **Batch Size:** 4 |
|
- **Optimizer:** AdamW |
|
- **Learning Rate:** 5e-5 |
|
|
|
### Training and Validation Loss Progression |
|
|
|
| Step | Training Loss | Validation Loss | |
|
|-------|--------------|----------------| |
|
| 1000 | 1.0017 | 1.0256 | |
|
| 2000 | 1.1644 | 0.8818 | |
|
| 3000 | 0.7851 | 0.8507 | |
|
| 4000 | 0.7416 | 0.8322 | |
|
| 5000 | 0.6960 | 0.8184 | |
|
| 6000 | 1.0118 | 0.8068 | |
|
| 7000 | 0.9897 | 0.7997 | |
|
| 8000 | 0.9165 | 0.7938 | |
|
| 9000 | 0.8048 | 0.7875 | |
|
| 10000 | 0.8869 | 0.7822 | |
|
| 11000 | 0.8387 | 0.7788 | |
|
| 12000 | 0.8117 | 0.7746 | |
|
| 13000 | 0.7259 | 0.7719 | |
|
| 14000 | 0.8100 | 0.7678 | |
|
| 15000 | 0.6901 | 0.7626 | |
|
| 16000 | 0.9630 | 0.7600 | |
|
| 17000 | 0.6599 | 0.7571 | |
|
| 18000 | 0.6770 | 0.7541 | |
|
| 19000 | 0.7360 | 0.7509 | |
|
| 20000 | 0.7170 | 0.7458 | |
|
| 21000 | 0.7993 | 0.7446 | |
|
| 22000 | 0.5846 | 0.7412 | |
|
| 23000 | 0.8269 | 0.7411 | |
|
| 24000 | 0.5817 | 0.7379 | |
|
| 25000 | 0.5772 | 0.7357 | |
|
|
|
- **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang) |
|
- **Language(s) (NLP):** [en] |
|
- **License:** [apache-2.0] |
|
- **Finetuned from model :** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |