DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B

Based in the Basque Country 🇪🇸

DeepSeek-Light-V1 is a highly optimized version of DeepSeek-Coder-6.7B, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines 4-bit quantization and pruning, significantly lowering the number of parameters while maintaining functional capabilities.

Key Optimizations 🚀

  • 4-bit Quantization (BFloat16): Reduces VRAM usage with minimal precision loss.
  • Pruning: Removes redundant parameters to enhance efficiency.
  • Optimized for lightweight deployment: Works on lower-end hardware.

Model Comparison 📊

Version Model Size GPU VRAM Usage Parameters Relative Performance
Original (DeepSeek-Coder-6.7B) 3.51GB 7.85GB 6.7B 100%
Optimized (DeepSeek-Light-V1) 3.51GB 3.93GB (50% reduction!) 3.5B ~50% performance

Why Use This Model? 💡

Runs on more affordable hardware – No need for high-end GPUs.
Reduces operational costs – More efficient deployment.
Enhances security – Enables local execution before moving to production.

How to Use 🛠️

You can load the model using transformers with quantization:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Load model and tokenizer
model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=quantization_config
)

# Generate text
def generate_text(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = "Explain how deep learning works in neural networks."
response = generate_text(prompt)
print(response)
Downloads last month
6
Safetensors
Model size
3.6B params
Tensor type
F32
·
FP16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanchezalonsodavid17/DeepSeek_Light_V1

Quantized
(22)
this model