Hugging Face community’s Wikimedia datasets Collection Wikimedia datasets created by the Hugging Face community, not Wikimedia. Sorted by Wikimedia project. • 17 items • Updated Jun 7, 2024 • 11
SwallowMath Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated 1 day ago • 2
SwallowCode Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated 1 day ago • 2
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 21 days ago • 88
EMOVA-Datasets Collection A collection of EMOVA datasets (https://emova-ollm.github.io/) • 6 items • Updated Mar 14 • 2
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset Mar 11 • 78
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 52
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 Feb 18 • 99