--- license: apache-2.0 datasets: - ahmedgongi/Devops_LLM language: - aa - am base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B new_version: Phpcool/DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 pipeline_tag: text-generation library_name: fasttext tags: - sre - devops - deepseek --- # DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 ## Model Introduction `DeepSeek-R1-Distill-SRE-Qwen-32B-INT8` is the industry's first publicly available operations large model. It is a specialized mixed-precision 8-bit quantized large language model fine-tuned from the `DeepSeek-R1-Distill-Qwen-32B` model, optimized specifically for **operations** and **Site Reliability Engineering (SRE)** scenarios. This model inherits the powerful reasoning capabilities of the DeepSeek-R1 series and has been further fine-tuned using the [ahmedgongi/Devops_LLM](https://huggingface.co/datasets/ahmedgongi/Devops_LLM) dataset, significantly enhancing its utility in the following tasks: - Automated script generation - System monitoring and analysis - Troubleshooting and root cause identification This model is suitable for enterprise-level system management, cloud-native operations platform development, and other scenarios, providing an efficient solution that balances performance and cost for intelligent operations. The current version uses 8-bit quantization (INT8), implemented with mixed-precision optimization via `bitsandbytes`. Linear layer weights are stored as `torch.int8`, while other components (e.g., Embeddings and LayerNorm) remain in `torch.float16`. We welcome community users to test the model and share their experiences, helping us improve the model documentation and application scenarios together! --- ## Model Files and Weights - **Model Files**: The model weights are stored in standard formats supported by Hugging Face (e.g., `.safetensors` or `.bin`) and are located in the root directory of this repository. Example file structure: ``` ├── config.json ├── model.safetensors ├── tokenizer.json └── ... ``` - **Quantization Details**: The model uses 8-bit quantization (INT8), with linear layer weights in `torch.int8` and non-quantized parts (e.g., Embeddings, LayerNorm) in `torch.float16`, optimized for mixed precision using `bitsandbytes`. --- ## How to Use the Model for Inference This model supports efficient inference and has been verified to be compatible with the `vLLM` and `SGLang` frameworks. Below is an example using SGLang (recommended). --- ### 1. Inference with SGLang `SGLang` is a high-performance serving framework suitable for fast inference in complex operations tasks. #### Environment Setup ```bash pip install sglang ``` #### Start the SGLang Server ```bash python -m sglang.launch_server --model-path [your-username]/DeepSeek-R1-Distill-SRE-Qwen-32B-INT8 --quant bitsandbytes --port 30000 ``` #### Python Inference Example ```python import openai client = openai.Client( base_url="http://127.0.0.1:30000/v1", api_key="EMPTY") # Chat completion response = client.chat.completions.create( model="default", messages=[ {"role": "system", "content": "You are a senior operations expert."}, {"role": "user", "content": "Analyze the following log and identify possible failure causes: '2023-10-10 12:00:00 ERROR: Disk I/O timeout'."}, ], temperature=0, max_tokens=2048, ) print(response.choices[0].message.content) ``` --- ## Model Details - **Base Model**: `DeepSeek-R1-Distill-Qwen-32B` - **Fine-Tuning Dataset**: [ahmedgongi/Devops_LLM](https://huggingface.co/datasets/ahmedgongi/Devops_LLM) - **Quantization**: 8-bit INT8 (linear layer weights), FP16 (Embeddings, LayerNorm, etc.) - **Compatible Frameworks**: `bitsandbytes`, `vLLM`, `SGLang` - **Recommended Hardware**: NVIDIA GPU (CUDA support), recommended 48GB*2+ VRAM to load the full model --- ## Use Cases - **Automated Operations**: Script generation, configuration management. - **System Monitoring**: Metric analysis, alert rule generation. - **Troubleshooting**: Log parsing, root cause analysis. The model excels in SRE and DevOps scenarios, particularly for enterprise applications requiring fast response times and resource optimization. --- ## Disclaimer Due to the nature of language models, the generated content may contain hallucinations or biased statements. Please use the model’s outputs with caution. If you plan to use this model publicly or commercially, note that the service provider is responsible for any adverse effects or harmful statements resulting from its use. The developers of this project are not liable for any damages or losses caused by the use of this project (including but not limited to data, models, code, etc.). --- ## Community Contributions Due to the limited information in the current documentation, we encourage community participation: - Raise questions, use cases, or improvement suggestions in the 【Community】 section on Hugging Face. - Submit Pull Requests to enhance model details, optimize inference code, or share operations-related prompt examples. Thank you for your use and support! If you have any questions, feel free to contact us. Email: liutiansi@gmail.com