Update README.md
Browse files
README.md
CHANGED
@@ -126,7 +126,7 @@ The NT-Java-1.1B model has been trained on publicly available datasets and is of
|
|
126 |
## Model
|
127 |
|
128 |
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective.
|
129 |
-
- **Pretraining steps:**
|
130 |
- **Context length:** 8K tokens
|
131 |
- **Pretraining tokens:** 22 billion
|
132 |
- **Precision:** bfloat16
|
|
|
126 |
## Model
|
127 |
|
128 |
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective.
|
129 |
+
- **Pretraining steps:** 100K
|
130 |
- **Context length:** 8K tokens
|
131 |
- **Pretraining tokens:** 22 billion
|
132 |
- **Precision:** bfloat16
|