Update README.md
Browse files
README.md
CHANGED
@@ -49,6 +49,8 @@ print("Predicted token:", predicted_token)
|
|
49 |
|
50 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|
51 |
|
|
|
|
|
52 |
| Model | stage 1 | stage 2 |
|
53 |
|:------------------ |----------------:|----------------:|
|
54 |
| max_seq_len | 1024 | 8192 |
|
@@ -74,6 +76,7 @@ For reference, Warner et al.'s ModernBERT uses 1.72T tokens for stage 1, 250B to
|
|
74 |
## Evaluation
|
75 |
|
76 |
For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
|
|
|
77 |
Evaluation code can be found at https://github.com/speed1313/bert-eval
|
78 |
|
79 |
| Model | JSTS | JNLI | JCoLA | Avg(JGLUE) | miracl | Avg |
|
|
|
49 |
|
50 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|
51 |
|
52 |
+
Training code can be found at https://github.com/llm-jp/bert-ja
|
53 |
+
|
54 |
| Model | stage 1 | stage 2 |
|
55 |
|:------------------ |----------------:|----------------:|
|
56 |
| max_seq_len | 1024 | 8192 |
|
|
|
76 |
## Evaluation
|
77 |
|
78 |
For the sentence classification task evaluation, the datasets JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used. For the evaluation of the Zero-shot Sentence Retrieval task, the [miracl/miracl](https://huggingface.co/datasets/miracl/miracl) dataset (ja subset) was used.
|
79 |
+
|
80 |
Evaluation code can be found at https://github.com/speed1313/bert-eval
|
81 |
|
82 |
| Model | JSTS | JNLI | JCoLA | Avg(JGLUE) | miracl | Avg |
|