Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ language:
|
|
7 |
---
|
8 |
<div align="center">
|
9 |
|
10 |
-
# TinyLlama-1.1B-
|
11 |
</div>
|
12 |
|
13 |
https://github.com/jzhang38/TinyLlama
|
@@ -26,7 +26,7 @@ Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-Ti
|
|
26 |
|
27 |
#### Basic pretraining
|
28 |
|
29 |
-
In this initial phase, we
|
30 |
|
31 |
#### Continual pretraining with specific domain
|
32 |
|
@@ -36,26 +36,26 @@ At the begining ~6B tokens in this stage, we linearly increased the sampling pro
|
|
36 |
|
37 |
#### Cooldown
|
38 |
|
39 |
-
Implementing a cooldown phase has become a crucial technique to achieve better model convergence at the end of pretraining. However, since we have already
|
40 |
|
41 |
#### Tinyllama model family
|
42 |
|
43 |
Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
|
44 |
|
45 |
-
1. **
|
46 |
-
2. **
|
47 |
-
3. **
|
48 |
|
49 |
|
50 |
|
51 |
### How to use
|
52 |
You will need the transformers>=4.31
|
53 |
-
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama)
|
54 |
```
|
55 |
from transformers import AutoTokenizer
|
56 |
import transformers
|
57 |
import torch
|
58 |
-
model = "TinyLlama/
|
59 |
tokenizer = AutoTokenizer.from_pretrained(model)
|
60 |
pipeline = transformers.pipeline(
|
61 |
"text-generation",
|
@@ -82,4 +82,4 @@ for seq in sequences:
|
|
82 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
83 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
84 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
85 |
-
| TinyLlama-1.1B-
|
|
|
7 |
---
|
8 |
<div align="center">
|
9 |
|
10 |
+
# TinyLlama-1.1B-v1.1
|
11 |
</div>
|
12 |
|
13 |
https://github.com/jzhang38/TinyLlama
|
|
|
26 |
|
27 |
#### Basic pretraining
|
28 |
|
29 |
+
In this initial phase, we managed to train our model with only slimpajama to develop its commonsense reasoning capabilities. The model was trained with 1.5T tokens during this basic pretraining period. Since we used a cluster with 4 A100-40G per node and we only shard model weights within a node, we can only set the batch size to approximately 1.8M this time.
|
30 |
|
31 |
#### Continual pretraining with specific domain
|
32 |
|
|
|
36 |
|
37 |
#### Cooldown
|
38 |
|
39 |
+
Implementing a cooldown phase has become a crucial technique to achieve better model convergence at the end of pretraining. However, since we have already used cosine learning rate strategy at the beginning, it becomes challenging to alter the learning rate for cooldown like what MiniCPM or deepseek does. Therefore, we try to cool down with adjusting our batch size. Specifically, we increase our batch size from 1.8M to 7.2M while keeping the original cosine learning rate schedule during our cooldown stage.
|
40 |
|
41 |
#### Tinyllama model family
|
42 |
|
43 |
Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
|
44 |
|
45 |
+
1. **TinyLlama_v1.1**: The standard version, used for general purposes.
|
46 |
+
2. **TinyLlama_v1.1_math_code**: Equipped with better ability for math and code.
|
47 |
+
3. **TinyLlama_v1.1_chinese**: Good understanding capacity for Chinese.
|
48 |
|
49 |
|
50 |
|
51 |
### How to use
|
52 |
You will need the transformers>=4.31
|
53 |
+
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
|
54 |
```
|
55 |
from transformers import AutoTokenizer
|
56 |
import transformers
|
57 |
import torch
|
58 |
+
model = "TinyLlama/TinyLlama_v1.1"
|
59 |
tokenizer = AutoTokenizer.from_pretrained(model)
|
60 |
pipeline = transformers.pipeline(
|
61 |
"text-generation",
|
|
|
82 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
83 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
84 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
85 |
+
| TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |
|