Fill-Mask
Transformers
Safetensors
Japanese
modernbert
speed commited on
Commit
a9f9f2e
·
verified ·
1 Parent(s): 8b76fd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -49,6 +49,8 @@ print("Predicted token:", predicted_token)
49
 
50
  This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
51
 
 
 
52
  | Model | stage 1 | stage 2 |
53
  |:------------------ |----------------:|----------------:|
54
  | max_seq_len | 1024 | 8192 |
@@ -74,6 +76,7 @@ For reference, Warner et al.'s ModernBERT uses 1.72T tokens for stage 1, 250B to
74
  ## Evaluation
75
 
76
  For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
 
77
  Evaluation code can be found at https://github.com/speed1313/bert-eval
78
 
79
  | Model | JSTS | JNLI | JCoLA | Avg(JGLUE) | miracl | Avg |
 
49
 
50
  This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
51
 
52
+ Training code can be found at https://github.com/llm-jp/bert-ja
53
+
54
  | Model | stage 1 | stage 2 |
55
  |:------------------ |----------------:|----------------:|
56
  | max_seq_len | 1024 | 8192 |
 
76
  ## Evaluation
77
 
78
  For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
79
+
80
  Evaluation code can be found at https://github.com/speed1313/bert-eval
81
 
82
  | Model | JSTS | JNLI | JCoLA | Avg(JGLUE) | miracl | Avg |