Update README.md
Browse files
README.md
CHANGED
@@ -10,11 +10,26 @@ base_model:
|
|
10 |
- deepseek-ai/DeepSeek-V3-0324
|
11 |
---
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
|
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
|
20 |
# DeepSeek-V3-0324
|
|
|
10 |
- deepseek-ai/DeepSeek-V3-0324
|
11 |
---
|
12 |
|
13 |
+
# Channel-wise INT8 DeepSeek-V3-0324
|
14 |
|
15 |
+
The INT8 quant for SGLang (https://github.com/sgl-project/sglang)
|
16 |
+
[PULL REQUEST](https://github.com/sgl-project/sglang/pull/3888)
|
17 |
|
18 |
+
## 1. Quantization Process
|
19 |
+
|
20 |
+
We apply INT8 quantization to the BF16 checkpoints.
|
21 |
+
|
22 |
+
The quantization scales are determined by dividing the channnel-wise maximum of element values by the INT8 type maximum.
|
23 |
+
|
24 |
+
To generate this weight, run the provided script in the ``./inference`` directory:
|
25 |
+
|
26 |
+
``
|
27 |
+
python3 bf16_cast_channel_int8.py --input-bf16-hf-path /path/to/bf16-weights/ --output-int8-hf-path /path/to/save-int8-weight/
|
28 |
+
``
|
29 |
+
## 2. Trouble Shooting
|
30 |
+
Before inference, you should confirm that there is no attribute "quantization_config" in `config.json`.
|
31 |
+
|
32 |
+
---
|
33 |
|
34 |
|
35 |
# DeepSeek-V3-0324
|