silence09
/

DeepSeek-R1-3layers

Model card Files Files and versions Community

silence09 commited on Feb 7

Commit

a042fd0

·

verified ·

1 Parent(s): dc69c52

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ This project is created using the official **Deepseek R1** model script (`modeli
 The three hidden layers consist of:
 - **A hidden layer: MLA + Dense MLP**
 - **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
-- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference) **
 ## Purpose
 The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.

 The three hidden layers consist of:
 - **A hidden layer: MLA + Dense MLP**
 - **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
+- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)**
 ## Purpose
 The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.