File size: 7,376 Bytes
60cc85b 1206ebf 60cc85b 1206ebf 323d75d 1206ebf 9105382 416edd2 9105382 416edd2 9105382 1206ebf 323d75d 1206ebf 416edd2 1206ebf 416edd2 1206ebf 323d75d 1206ebf 323d75d 1206ebf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---
# tangled-alpha-0.3-core

```bash
time python -B prepare_core_datasets.py
```
```
i=0, min_len=0, max_len=1048576, block_size=2049, chunk_size=16392000, len(dataset)=3134311, len(dataset) * block_size=6422203239
Total number of tokens in the optimized dataset '../core-data-0-0-1048576-2049-8000' is 6422203239
i=1, min_len=2049, max_len=8193, block_size=8193, chunk_size=16386000, len(dataset)=179944, len(dataset) * block_size=1474281192
Total number of tokens in the optimized dataset '../core-data-1-2049-8193-8193-2000' is 1474281192
i=2, min_len=8193, max_len=1048577, block_size=32769, chunk_size=16384500, len(dataset)=48261, len(dataset) * block_size=1581464709
Total number of tokens in the optimized dataset '../core-data-2-8193-1048577-32769-500' is 1581464709
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml
```
```
Seed set to 23
Time to instantiate model: 0.30 seconds.
Total parameters: 185,631,232
Verifying settings ...
Measured TFLOPs: 14094.64
Epoch 1 | iter 128 step 1 | loss train: 11.709, val: n/a | iter time: 341.75 ms (step) remaining time: 3 days, 20:04:36
Epoch 1 | iter 256 step 2 | loss train: 11.716, val: n/a | iter time: 287.55 ms (step) remaining time: 3 days, 3:29:34
Epoch 1 | iter 384 step 3 | loss train: 11.711, val: n/a | iter time: 290.88 ms (step) remaining time: 2 days, 22:16:53
Epoch 1 | iter 512 step 4 | loss train: 11.706, val: n/a | iter time: 291.81 ms (step) remaining time: 2 days, 19:34:34
Epoch 1 | iter 640 step 5 | loss train: 11.696, val: n/a | iter time: 291.37 ms (step) remaining time: 2 days, 17:59:17
Epoch 1 | iter 768 step 6 | loss train: 11.687, val: n/a | iter time: 290.50 ms (step) remaining time: 2 days, 16:55:49
Epoch 1 | iter 896 step 7 | loss train: 11.675, val: n/a | iter time: 291.08 ms (step) remaining time: 2 days, 16:10:38
Epoch 1 | iter 1024 step 8 | loss train: 11.660, val: n/a | iter time: 294.46 ms (step) remaining time: 2 days, 15:36:26
Epoch 1 | iter 1152 step 9 | loss train: 11.640, val: n/a | iter time: 292.26 ms (step) remaining time: 2 days, 15:09:28
Epoch 1 | iter 1280 step 10 | loss train: 11.626, val: n/a | iter time: 289.93 ms (step) remaining time: 2 days, 14:47:34
Epoch 1 | iter 1408 step 11 | loss train: 11.584, val: n/a | iter time: 292.15 ms (step) remaining time: 2 days, 14:29:19
Epoch 1 | iter 1536 step 12 | loss train: 11.526, val: n/a | iter time: 291.24 ms (step) remaining time: 2 days, 14:13:54
Epoch 1 | iter 1664 step 13 | loss train: 11.483, val: n/a | iter time: 291.11 ms (step) remaining time: 2 days, 14:00:48
Epoch 1 | iter 1792 step 14 | loss train: 11.430, val: n/a | iter time: 290.68 ms (step) remaining time: 2 days, 13:49:24
Epoch 1 | iter 1920 step 15 | loss train: 11.392, val: n/a | iter time: 290.37 ms (step) remaining time: 2 days, 13:39:22
Epoch 1 | iter 2048 step 16 | loss train: 11.326, val: n/a | iter time: 290.31 ms (step) remaining time: 2 days, 13:30:34
Epoch 1 | iter 2176 step 17 | loss train: 11.279, val: n/a | iter time: 290.33 ms (step) remaining time: 2 days, 13:22:34
Epoch 1 | iter 2304 step 18 | loss train: 11.222, val: n/a | iter time: 290.50 ms (step) remaining time: 2 days, 13:15:27
Epoch 1 | iter 2432 step 19 | loss train: 11.163, val: n/a | iter time: 290.39 ms (step) remaining time: 2 days, 13:09:11
Epoch 1 | iter 2560 step 20 | loss train: 11.094, val: n/a | iter time: 290.00 ms (step) remaining time: 2 days, 13:03:21
# ...
Epoch 1 | iter 782592 step 6114 | loss train: 3.080, val: 3.255 | iter time: 288.91 ms (step) remaining time: 0:06:14
Epoch 1 | iter 782720 step 6115 | loss train: 3.096, val: 3.255 | iter time: 289.11 ms (step) remaining time: 0:05:39
Epoch 1 | iter 782848 step 6116 | loss train: 2.977, val: 3.255 | iter time: 289.28 ms (step) remaining time: 0:05:04
Epoch 1 | iter 782976 step 6117 | loss train: 3.040, val: 3.255 | iter time: 289.24 ms (step) remaining time: 0:04:29
Epoch 1 | iter 783104 step 6118 | loss train: 3.062, val: 3.255 | iter time: 290.49 ms (step) remaining time: 0:03:54
Epoch 1 | iter 783232 step 6119 | loss train: 3.037, val: 3.255 | iter time: 289.91 ms (step) remaining time: 0:03:19
Epoch 1 | iter 783360 step 6120 | loss train: 3.028, val: 3.255 | iter time: 289.49 ms (step) remaining time: 0:02:44
Epoch 1 | iter 783488 step 6121 | loss train: 3.007, val: 3.255 | iter time: 289.81 ms (step) remaining time: 0:02:09
Epoch 2 | iter 783616 step 6122 | loss train: 3.007, val: 3.255 | iter time: 289.34 ms (step) remaining time: 0:01:34
Epoch 2 | iter 783744 step 6123 | loss train: 3.046, val: 3.255 | iter time: 288.52 ms (step) remaining time: 0:00:59
Epoch 2 | iter 783872 step 6124 | loss train: 3.140, val: 3.255 | iter time: 288.66 ms (step) remaining time: 0:00:24
Validating ...
Final evaluation | val loss: 3.254 | val ppl: 25.904
Saving checkpoint to '../out/pretrain-core-0/final/lit_model.pth'
----------------------------------------
| Performance
| - Total tokens : 6,422,200,320
| - Training Time : 214857.29 s
| - Tok/sec : 109674.70 tok/s
| ----------------------------------------
| Memory Usage
| - Memory Used : 17.30 GB
----------------------------------------
```
Backup `wandb`:
```bash
mv wandb wandb-pretrain-core
```
Chat with model:
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
```
```
# ...
```
|