Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceTB 's Collections
SmolLM2
SmolVLM2 πŸ“Ί Smallest video LM ever 🀏🏻
πŸ“š LLM pretraining datasets
SmolVLM
🧩 SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
πŸ“ FineMath
πŸ’» Local SmolLMs
πŸͺ SmolLM
Instruct datasets
🌌 Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
models

🌌 Cosmopedia

updated 3 days ago

Resources for Cosmopedia dataset

Upvote
9

  • HuggingFaceTB/cosmopedia

    Viewer β€’ Updated Aug 12, 2024 β€’ 31.1M β€’ 26.7k β€’ 611

  • HuggingFaceTB/cosmo-1b

    Text Generation β€’ Updated Jul 8, 2024 β€’ 130 β€’ 131

  • Running
    6
    6

    Web clusters

    πŸ•Έ

    Browse and explore clustered web samples by educational value


  • HuggingFaceTB/cosmopedia-100k

    Viewer β€’ Updated Feb 19, 2024 β€’ 100k β€’ 2.85k β€’ 43

  • HuggingFaceTB/cosmopedia-meta

    Viewer β€’ Updated Feb 20, 2024 β€’ 31.1M β€’ 38 β€’ 2

  • HuggingFaceTB/smollm-corpus

    Viewer β€’ Updated Sep 6, 2024 β€’ 237M β€’ 8.86k β€’ 331
Upvote
9
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs