llm-jp
/

llm-jp-modernbert-base

Model card Files Files and versions Community

speed commited on 6 days ago

Commit

84e67a0

·

verified ·

1 Parent(s): 3a6d756

Update README.md

Files changed (1) hide show

README.md +0 -4

README.md CHANGED Viewed

@@ -70,10 +70,6 @@ Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
 The blank in stage 2 indicate the same value as in stage 1.
-In theory, stage 1 consumes 1.7T tokens, but sentences with fewer than 1024 tokens are truncated, so the actual consumption is lower. Stage 2 theoretically consumes 0.6T tokens.
-For reference, [ModernBERT](https://arxiv.org/abs/2412.13663) uses 1.72T tokens for stage 1, 250B tokens for stage 2, and 50B tokens for stage 3.
 ## Evaluation
 JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.

 The blank in stage 2 indicate the same value as in stage 1.
 ## Evaluation
 JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.