continued pretraining of llama3.1 8b on refinedweb for ~80M tokens to try to undo the annealing step and make it act more like an actual base model
From Our Page
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collections
1
models
10
from-our-page/BigLlama-2-120B
Text Generation
•
Updated
•
9
from-our-page/Llama-4-Maverick-17B-128E-mlx-4bit
Text Generation
•
Updated
•
31
•
1
from-our-page/llama3.1-8b-refinedbase-checkpoint-5120
Updated
•
3
from-our-page/llama3.1-8b-refinedbase-checkpoint-4480
Updated
•
5
from-our-page/llama3.1-8b-refinedbase-checkpoint-3840
Updated
•
3
from-our-page/llama3.1-8b-refinedbase-checkpoint-3200
Updated
•
4
from-our-page/llama3.1-8b-refinedbase-checkpoint-2560
Updated
•
3
from-our-page/llama3.1-8b-refinedbase-checkpoint-1920
Updated
•
3
from-our-page/llama3.1-8b-refinedbase-checkpoint-1280
Updated
•
3
from-our-page/llama3.1-8b-refinedbase-checkpoint-640
Updated
•
5
datasets
0
None public yet