File size: 7,509 Bytes
3d391d5
 
8432ba4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d391d5
8432ba4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50de401
9e1bdc3
 
50de401
9e1bdc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8432ba4
12444b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8432ba4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---

license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
    'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
    'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
    'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
    'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
    'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
    'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
    'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---


# tangled-alpha-0.1-core

![logo](./misc/logo.jpg)

```bash

time python -B prepare_core_datasets.py

```

```

Progress: 100%|████████| 220/220 [23:15<00:00,  6.34s/it]

Workers are finished.██| 220/220 [23:15<00:00,  6.34s/it]

Finished data processing!

i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160

Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160

```

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml

```

```

Seed set to 23

Time to instantiate model: 0.24 seconds.

Total parameters: 182,125,056

Verifying settings ...

Measured TFLOPs: 7041.81



Epoch 1 | iter 256 step 1 | loss train: 10.529, val: n/a | iter time: 1696.67 ms (step) remaining time: 4 days, 7:44:36

Epoch 1 | iter 512 step 2 | loss train: 10.200, val: n/a | iter time: 1260.46 ms (step) remaining time: 4 days, 2:29:51

Epoch 1 | iter 768 step 3 | loss train: 9.875, val: n/a | iter time: 1246.06 ms (step) remaining time: 4 days, 0:59:11

Epoch 1 | iter 1024 step 4 | loss train: 9.634, val: n/a | iter time: 1245.91 ms (step) remaining time: 4 days, 0:38:01

Epoch 1 | iter 1280 step 5 | loss train: 9.504, val: n/a | iter time: 1248.04 ms (step) remaining time: 4 days, 0:28:49

Epoch 1 | iter 1536 step 6 | loss train: 9.371, val: n/a | iter time: 1220.81 ms (step) remaining time: 4 days, 0:32:52

Epoch 1 | iter 1792 step 7 | loss train: 9.269, val: n/a | iter time: 1238.00 ms (step) remaining time: 4 days, 0:30:03

Epoch 1 | iter 2048 step 8 | loss train: 9.214, val: n/a | iter time: 1244.22 ms (step) remaining time: 4 days, 0:30:30

Epoch 1 | iter 2304 step 9 | loss train: 9.109, val: n/a | iter time: 1220.57 ms (step) remaining time: 4 days, 0:25:37

Epoch 1 | iter 2560 step 10 | loss train: 9.061, val: n/a | iter time: 1251.13 ms (step) remaining time: 4 days, 0:12:57

Epoch 1 | iter 2816 step 11 | loss train: 9.031, val: n/a | iter time: 1241.17 ms (step) remaining time: 4 days, 0:05:06

Epoch 1 | iter 3072 step 12 | loss train: 8.944, val: n/a | iter time: 1280.45 ms (step) remaining time: 4 days, 0:00:31

Epoch 1 | iter 3328 step 13 | loss train: 8.931, val: n/a | iter time: 1241.07 ms (step) remaining time: 4 days, 0:00:08

Epoch 1 | iter 3584 step 14 | loss train: 8.910, val: n/a | iter time: 1229.04 ms (step) remaining time: 3 days, 23:59:03

Epoch 1 | iter 3840 step 15 | loss train: 8.823, val: n/a | iter time: 1239.92 ms (step) remaining time: 3 days, 23:55:02

Epoch 1 | iter 4096 step 16 | loss train: 8.745, val: n/a | iter time: 1239.53 ms (step) remaining time: 3 days, 23:50:02

Epoch 1 | iter 4352 step 17 | loss train: 8.679, val: n/a | iter time: 1271.10 ms (step) remaining time: 3 days, 23:46:19

Epoch 1 | iter 4608 step 18 | loss train: 8.654, val: n/a | iter time: 1246.47 ms (step) remaining time: 3 days, 23:43:27

Epoch 1 | iter 4864 step 19 | loss train: 8.651, val: n/a | iter time: 1246.56 ms (step) remaining time: 3 days, 23:41:11

Epoch 1 | iter 5120 step 20 | loss train: 8.639, val: n/a | iter time: 1219.66 ms (step) remaining time: 3 days, 23:35:38

# ...

Epoch 1 | iter 442880 step 1730 | loss train: 2.740, val: 2.863 | iter time: 1340.98 ms (step) remaining time: 0:51:28

Epoch 1 | iter 443136 step 1731 | loss train: 2.734, val: 2.863 | iter time: 1387.92 ms (step) remaining time: 0:48:00

Epoch 1 | iter 443392 step 1732 | loss train: 2.730, val: 2.863 | iter time: 1309.36 ms (step) remaining time: 0:44:31

Epoch 1 | iter 443648 step 1733 | loss train: 2.715, val: 2.863 | iter time: 1292.23 ms (step) remaining time: 0:41:03

Epoch 1 | iter 443904 step 1734 | loss train: 2.718, val: 2.863 | iter time: 1311.24 ms (step) remaining time: 0:37:35

Epoch 1 | iter 444160 step 1735 | loss train: 2.709, val: 2.863 | iter time: 1291.09 ms (step) remaining time: 0:34:07

Epoch 1 | iter 444416 step 1736 | loss train: 2.723, val: 2.863 | iter time: 1304.14 ms (step) remaining time: 0:30:39

Epoch 1 | iter 444672 step 1737 | loss train: 2.721, val: 2.863 | iter time: 1278.33 ms (step) remaining time: 0:27:10

Epoch 1 | iter 444928 step 1738 | loss train: 2.697, val: 2.863 | iter time: 1292.86 ms (step) remaining time: 0:23:42

Epoch 1 | iter 445184 step 1739 | loss train: 2.763, val: 2.863 | iter time: 1284.40 ms (step) remaining time: 0:20:14

Epoch 1 | iter 445440 step 1740 | loss train: 2.775, val: 2.863 | iter time: 1302.58 ms (step) remaining time: 0:16:46

Epoch 1 | iter 445696 step 1741 | loss train: 2.756, val: 2.863 | iter time: 1298.86 ms (step) remaining time: 0:13:18

Epoch 1 | iter 445952 step 1742 | loss train: 2.728, val: 2.863 | iter time: 1279.11 ms (step) remaining time: 0:09:49

Epoch 1 | iter 446208 step 1743 | loss train: 2.637, val: 2.863 | iter time: 1308.11 ms (step) remaining time: 0:06:21

Epoch 1 | iter 446464 step 1744 | loss train: 2.638, val: 2.863 | iter time: 1294.08 ms (step) remaining time: 0:02:53

Validating ...

Final evaluation | val loss: 2.862 | val ppl: 17.494

Saving checkpoint to '../out/pretrain-core/final/lit_model.pth'

----------------------------------------

| Performance

| - Total tokens  : 7,318,355,968

| - Training Time : 363457.29 s

| - Tok/sec       : 2103064.60 tok/s

| ----------------------------------------

| Memory Usage

| - Memory Used   : 20.93 GB

----------------------------------------

```

Backup `wandb`:

```bash

mv wandb wandb-pretrain-core

```

Chat with model:

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final

```

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'

```

```

# ...

```