prepare datasets
Browse files- README.md +5 -35
- scripts/pretrain-core-model-0.yaml +4 -3
README.md
CHANGED
@@ -44,7 +44,7 @@ tags:
|
|
44 |
- reason
|
45 |
---
|
46 |
|
47 |
-
# tangled-alpha-0.
|
48 |
|
49 |

|
50 |
|
@@ -53,44 +53,14 @@ time python -B prepare_core_datasets.py
|
|
53 |
```
|
54 |
|
55 |
```
|
56 |
-
|
57 |
-
Workers are finished.██| 220/220 [23:15<00:00, 6.34s/it]
|
58 |
-
Finished data processing!
|
59 |
-
i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
|
60 |
-
Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
|
61 |
```
|
62 |
|
63 |
```bash
|
64 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
|
65 |
```
|
66 |
|
67 |
```
|
68 |
-
Seed set to 23
|
69 |
-
Time to instantiate model: 0.32 seconds.
|
70 |
-
Total parameters: 217,088,512
|
71 |
-
Verifying settings ...
|
72 |
-
Measured TFLOPs: 3548.40
|
73 |
-
|
74 |
-
Epoch 1 | iter 256 step 1 | loss train: 11.716, val: n/a | iter time: 1735.26 ms (step) remaining time: 4 days, 11:06:29
|
75 |
-
Epoch 1 | iter 512 step 2 | loss train: 11.534, val: n/a | iter time: 1102.77 ms (step) remaining time: 4 days, 2:31:30
|
76 |
-
Epoch 1 | iter 768 step 3 | loss train: 11.356, val: n/a | iter time: 1095.87 ms (step) remaining time: 3 days, 23:44:12
|
77 |
-
Epoch 1 | iter 1024 step 4 | loss train: 11.162, val: n/a | iter time: 1099.92 ms (step) remaining time: 3 days, 22:18:27
|
78 |
-
Epoch 1 | iter 1280 step 5 | loss train: 11.018, val: n/a | iter time: 1096.45 ms (step) remaining time: 3 days, 21:24:35
|
79 |
-
Epoch 1 | iter 1536 step 6 | loss train: 10.901, val: n/a | iter time: 1093.65 ms (step) remaining time: 3 days, 20:48:11
|
80 |
-
Epoch 1 | iter 1792 step 7 | loss train: 10.850, val: n/a | iter time: 1100.16 ms (step) remaining time: 3 days, 20:22:00
|
81 |
-
Epoch 1 | iter 2048 step 8 | loss train: 10.780, val: n/a | iter time: 1092.67 ms (step) remaining time: 3 days, 20:01:57
|
82 |
-
Epoch 1 | iter 2304 step 9 | loss train: 10.692, val: n/a | iter time: 1095.77 ms (step) remaining time: 3 days, 19:45:57
|
83 |
-
Epoch 1 | iter 2560 step 10 | loss train: 10.678, val: n/a | iter time: 1092.12 ms (step) remaining time: 3 days, 19:32:43
|
84 |
-
Epoch 1 | iter 2816 step 11 | loss train: 10.619, val: n/a | iter time: 1094.44 ms (step) remaining time: 3 days, 19:21:32
|
85 |
-
Epoch 1 | iter 3072 step 12 | loss train: 10.588, val: n/a | iter time: 1102.51 ms (step) remaining time: 3 days, 19:12:30
|
86 |
-
Epoch 1 | iter 3328 step 13 | loss train: 10.514, val: n/a | iter time: 1095.57 ms (step) remaining time: 3 days, 19:04:07
|
87 |
-
Epoch 1 | iter 3584 step 14 | loss train: 10.472, val: n/a | iter time: 1104.00 ms (step) remaining time: 3 days, 18:56:56
|
88 |
-
Epoch 1 | iter 3840 step 15 | loss train: 10.431, val: n/a | iter time: 1096.00 ms (step) remaining time: 3 days, 18:50:21
|
89 |
-
Epoch 1 | iter 4096 step 16 | loss train: 10.392, val: n/a | iter time: 1098.34 ms (step) remaining time: 3 days, 18:44:25
|
90 |
-
Epoch 1 | iter 4352 step 17 | loss train: 10.360, val: n/a | iter time: 1106.53 ms (step) remaining time: 3 days, 18:38:58
|
91 |
-
Epoch 1 | iter 4608 step 18 | loss train: 10.329, val: n/a | iter time: 1084.95 ms (step) remaining time: 3 days, 18:33:58
|
92 |
-
Epoch 1 | iter 4864 step 19 | loss train: 10.296, val: n/a | iter time: 1096.22 ms (step) remaining time: 3 days, 18:29:12
|
93 |
-
Epoch 1 | iter 5120 step 20 | loss train: 10.236, val: n/a | iter time: 1093.39 ms (step) remaining time: 3 days, 18:24:51
|
94 |
# ...
|
95 |
```
|
96 |
|
@@ -103,11 +73,11 @@ mv wandb wandb-pretrain-core
|
|
103 |
Chat with model:
|
104 |
|
105 |
```bash
|
106 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
|
107 |
```
|
108 |
|
109 |
```bash
|
110 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
|
111 |
```
|
112 |
|
113 |
```
|
|
|
44 |
- reason
|
45 |
---
|
46 |
|
47 |
+
# tangled-alpha-0.3-core
|
48 |
|
49 |

|
50 |
|
|
|
53 |
```
|
54 |
|
55 |
```
|
56 |
+
# ...
|
|
|
|
|
|
|
|
|
57 |
```
|
58 |
|
59 |
```bash
|
60 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml
|
61 |
```
|
62 |
|
63 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
# ...
|
65 |
```
|
66 |
|
|
|
73 |
Chat with model:
|
74 |
|
75 |
```bash
|
76 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
|
77 |
```
|
78 |
|
79 |
```bash
|
80 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
|
81 |
```
|
82 |
|
83 |
```
|
scripts/pretrain-core-model-0.yaml
CHANGED
@@ -25,7 +25,7 @@ model_config:
|
|
25 |
|
26 |
# Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
|
27 |
# /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
|
28 |
-
out_dir: "../out/pretrain-core/"
|
29 |
|
30 |
# The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
|
31 |
# precision: bf16-mixed
|
@@ -60,6 +60,7 @@ train:
|
|
60 |
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
|
61 |
global_batch_size: 512
|
62 |
# global_batch_size: 256
|
|
|
63 |
|
64 |
# Number of samples per data-parallel rank (type: int, default: 4)
|
65 |
micro_batch_size: 4
|
@@ -67,7 +68,7 @@ train:
|
|
67 |
# micro_batch_size: 1
|
68 |
|
69 |
# Number of iterations with learning rate warmup active (type: int, default: 2000)
|
70 |
-
lr_warmup_steps:
|
71 |
|
72 |
# Number of epochs to train on (type: Optional[int], default: null)
|
73 |
epochs:
|
@@ -93,7 +94,7 @@ train:
|
|
93 |
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
|
94 |
eval:
|
95 |
# Number of optimizer steps between evaluation calls (type: int, default: 1000)
|
96 |
-
interval:
|
97 |
|
98 |
# Number of tokens to generate (type: Optional[int], default: null)
|
99 |
max_new_tokens:
|
|
|
25 |
|
26 |
# Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
|
27 |
# /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
|
28 |
+
out_dir: "../out/pretrain-core-0/"
|
29 |
|
30 |
# The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
|
31 |
# precision: bf16-mixed
|
|
|
60 |
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
|
61 |
global_batch_size: 512
|
62 |
# global_batch_size: 256
|
63 |
+
# global_batch_size: 128
|
64 |
|
65 |
# Number of samples per data-parallel rank (type: int, default: 4)
|
66 |
micro_batch_size: 4
|
|
|
68 |
# micro_batch_size: 1
|
69 |
|
70 |
# Number of iterations with learning rate warmup active (type: int, default: 2000)
|
71 |
+
lr_warmup_steps: 500
|
72 |
|
73 |
# Number of epochs to train on (type: Optional[int], default: null)
|
74 |
epochs:
|
|
|
94 |
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
|
95 |
eval:
|
96 |
# Number of optimizer steps between evaluation calls (type: int, default: 1000)
|
97 |
+
interval: 100
|
98 |
|
99 |
# Number of tokens to generate (type: Optional[int], default: null)
|
100 |
max_new_tokens:
|