Update README.md
Browse files
README.md
CHANGED
@@ -117,7 +117,7 @@ The model, NT-Java-1.1B, has been trained on publicly available datasets and com
|
|
117 |
|
118 |
## Model
|
119 |
|
120 |
-
- **Architecture:** GPT-2 model with
|
121 |
- **•Fine-training steps:** 50k
|
122 |
- **Pretraining tokens:** 22 Billion
|
123 |
- **Precision:** bfloat16
|
|
|
117 |
|
118 |
## Model
|
119 |
|
120 |
+
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
|
121 |
- **•Fine-training steps:** 50k
|
122 |
- **Pretraining tokens:** 22 Billion
|
123 |
- **Precision:** bfloat16
|