CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 21 days ago • 88
Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 • 11 items • Updated 10 days ago • 81
OLMo 2 Preview Post-trained Models Collection These model's tokenizer did not use HF's fast tokenizer, resulting in variations in how pre-tokenization was applied. Resolved in latest versions. • 6 items • Updated 8 days ago • 4
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229