llm-jp
/

llm-jp-modernbert-base

Model card Files Files and versions Community

speed commited on Mar 19

Commit

a9f9f2e

·

verified ·

1 Parent(s): 8b76fd0

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -49,6 +49,8 @@ print("Predicted token:", predicted_token)
 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
 | Model              |      stage 1    |    stage 2      |
 |:------------------ |----------------:|----------------:|
 | max_seq_len        | 1024            | 8192            |
@@ -74,6 +76,7 @@ For reference, Warner et al.'s ModernBERT uses 1.72T tokens for stage 1, 250B to
 ## Evaluation
 For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
 Evaluation code can be found at https://github.com/speed1313/bert-eval
 | Model                                          |   JSTS |   JNLI |   JCoLA |   Avg(JGLUE) | miracl   |    Avg |

 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
+Training code can be found at https://github.com/llm-jp/bert-ja
 | Model              |      stage 1    |    stage 2      |
 |:------------------ |----------------:|----------------:|
 | max_seq_len        | 1024            | 8192            |
 ## Evaluation
 For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
 Evaluation code can be found at https://github.com/speed1313/bert-eval
 | Model                                          |   JSTS |   JNLI |   JCoLA |   Avg(JGLUE) | miracl   |    Avg |