Update README.md
Browse files
README.md
CHANGED
@@ -70,10 +70,6 @@ Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
|
|
70 |
|
71 |
The blank in stage 2 indicate the same value as in stage 1.
|
72 |
|
73 |
-
In theory, stage 1 consumes 1.7T tokens, but sentences with fewer than 1024 tokens are truncated, so the actual consumption is lower. Stage 2 theoretically consumes 0.6T tokens.
|
74 |
-
|
75 |
-
For reference, [ModernBERT](https://arxiv.org/abs/2412.13663) uses 1.72T tokens for stage 1, 250B tokens for stage 2, and 50B tokens for stage 3.
|
76 |
-
|
77 |
## Evaluation
|
78 |
|
79 |
JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
|
|
|
70 |
|
71 |
The blank in stage 2 indicate the same value as in stage 1.
|
72 |
|
|
|
|
|
|
|
|
|
73 |
## Evaluation
|
74 |
|
75 |
JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
|