Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ This project is created using the official **Deepseek R1** model script (`modeli
|
|
11 |
The three hidden layers consist of:
|
12 |
- **A hidden layer: MLA + Dense MLP**
|
13 |
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
14 |
-
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)
|
15 |
|
16 |
## Purpose
|
17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
|
|
11 |
The three hidden layers consist of:
|
12 |
- **A hidden layer: MLA + Dense MLP**
|
13 |
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
14 |
+
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)**
|
15 |
|
16 |
## Purpose
|
17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|