NotShrirang
/

DeepSeek-R1-Distill-Qwen-1.5B-SQL-Coder-PEFT

Text Generation

text-generation-inference

Model card Files Files and versions Community

NotShrirang commited on Feb 22

Commit

e6f0f9c

·

verified ·

1 Parent(s): c7713bc

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -12,7 +12,17 @@ base_model:
 ---
 # DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation
-This model is a fine-tuned version of DeepSeek-R1-Distill-Qwen-1.5B, specifically trained for Text-to-SQL query generation. It has been fine-tuned on the GretelAI Synthetic Text-to-SQL dataset, enabling it to convert natural language questions into SQL queries accurately.
 ## Use Cases
 1. Assisting developers and analysts in writing SQL queries.

 ---
 # DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation
+This model is a fine-tuned version of DeepSeek R1 Distill Qwen 1.5B, specifically optimized for SQL query generation. It has been trained on the GretelAI Synthetic Text-to-SQL dataset to enhance its ability to convert natural language prompts into accurate SQL queries.
+Due to its lightweight architecture, this model can be deployed efficiently on local machines without requiring a GPU, making it ideal for on-premises inference in resource-constrained environments. It offers a balance between performance and efficiency, making it suitable for businesses and developers looking for a cost-effective SQL generation solution.
+## Training Methodology
+1. Fine-tuning approach: LoRA (Low-Rank Adaptation) for efficient parameter tuning.
+2. Precision: bfloat16 (bf16) to reduce memory consumption while maintaining numerical stability.
+3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits.
+4. Optimizer: AdamW with learning rate scheduling.
+5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.)
+6. Hardware: Trained on 8xA100 GPUs with mixed precision training.
 ## Use Cases
 1. Assisting developers and analysts in writing SQL queries.