Text Generation
Transformers
PyTorch
English
llama
text-generation-inference
PY007 commited on
Commit
5334c8e
·
verified ·
1 Parent(s): afff329

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -7,7 +7,7 @@ language:
7
  ---
8
  <div align="center">
9
 
10
- # TinyLlama-1.1B-v2
11
  </div>
12
 
13
  https://github.com/jzhang38/TinyLlama
@@ -26,7 +26,7 @@ Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-Ti
26
 
27
  #### Basic pretraining
28
 
29
- In this initial phase, we manage to train our model with language-only corpus (slimpajama) to develop its commonsense reasoning capabilities. The model was trained with 1.5T tokens during this basic pretraining period. Due to memory constraints, we set the batch size to approximately 1.8M.
30
 
31
  #### Continual pretraining with specific domain
32
 
@@ -36,26 +36,26 @@ At the begining ~6B tokens in this stage, we linearly increased the sampling pro
36
 
37
  #### Cooldown
38
 
39
- Implementing a cooldown phase has become a crucial technique to achieve better model convergence at the end of pretraining. However, since we have already use cosine learning rate strategy at the beginning, it becomes challenging to alter the learning rate for cooldown like what MiniCPM or deepseek does. Therefore, we try to cool down with adjusting our batch size. Specifically, we increase our batch size from 1.8M to 7.2M while keeping the original cosine learning rate schedule during our cooldown stage.
40
 
41
  #### Tinyllama model family
42
 
43
  Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
44
 
45
- 1. **TinyLlama_v2**: The standard version, used for general purposes.
46
- 2. **TinyLlama_v2_math_code**: Equipped with better ability for math and code.
47
- 3. **TinyLlama_v2_chinese**: Good understanding capacity for Chinese language.
48
 
49
 
50
 
51
  ### How to use
52
  You will need the transformers>=4.31
53
- Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) github page for more information.
54
  ```
55
  from transformers import AutoTokenizer
56
  import transformers
57
  import torch
58
- model = "TinyLlama/TinyLlama_v2"
59
  tokenizer = AutoTokenizer.from_pretrained(model)
60
  pipeline = transformers.pipeline(
61
  "text-generation",
@@ -82,4 +82,4 @@ for seq in sequences:
82
  | ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
83
  | Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
84
  | TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
85
- | TinyLlama-1.1B-v2 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |
 
7
  ---
8
  <div align="center">
9
 
10
+ # TinyLlama-1.1B-v1.1
11
  </div>
12
 
13
  https://github.com/jzhang38/TinyLlama
 
26
 
27
  #### Basic pretraining
28
 
29
+ In this initial phase, we managed to train our model with only slimpajama to develop its commonsense reasoning capabilities. The model was trained with 1.5T tokens during this basic pretraining period. Since we used a cluster with 4 A100-40G per node and we only shard model weights within a node, we can only set the batch size to approximately 1.8M this time.
30
 
31
  #### Continual pretraining with specific domain
32
 
 
36
 
37
  #### Cooldown
38
 
39
+ Implementing a cooldown phase has become a crucial technique to achieve better model convergence at the end of pretraining. However, since we have already used cosine learning rate strategy at the beginning, it becomes challenging to alter the learning rate for cooldown like what MiniCPM or deepseek does. Therefore, we try to cool down with adjusting our batch size. Specifically, we increase our batch size from 1.8M to 7.2M while keeping the original cosine learning rate schedule during our cooldown stage.
40
 
41
  #### Tinyllama model family
42
 
43
  Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
44
 
45
+ 1. **TinyLlama_v1.1**: The standard version, used for general purposes.
46
+ 2. **TinyLlama_v1.1_math_code**: Equipped with better ability for math and code.
47
+ 3. **TinyLlama_v1.1_chinese**: Good understanding capacity for Chinese.
48
 
49
 
50
 
51
  ### How to use
52
  You will need the transformers>=4.31
53
+ Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
54
  ```
55
  from transformers import AutoTokenizer
56
  import transformers
57
  import torch
58
+ model = "TinyLlama/TinyLlama_v1.1"
59
  tokenizer = AutoTokenizer.from_pretrained(model)
60
  pipeline = transformers.pipeline(
61
  "text-generation",
 
82
  | ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
83
  | Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
84
  | TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
85
+ | TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |