Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
vilm 's Collections
Quyen
Smol Pretraining
VinaLLaMA
Vietcuna
Mixsmol

Smol Pretraining

updated Feb 9, 2024

Curated & High quality Synthetic Textbook Datasets for Pretraining

Upvote
3

  • vilm/code-textbooks

    Viewer • Updated Jan 20, 2024 • 207k • 11 • 3

  • vilm/MathPile-arXiv

    Viewer • Updated Jan 22, 2024 • 340k • 11 • 2

  • vilm/MathPile-StackExchange

    Viewer • Updated Jan 22, 2024 • 264k • 13 • 1

  • vilm/MathPile-ProofWiki

    Viewer • Updated Jan 22, 2024 • 23.6k • 42

  • vilm/MathPile-Textbooks

    Viewer • Updated Jan 22, 2024 • 784 • 10

  • vilm/MathPile-Wikipedia

    Viewer • Updated Jan 22, 2024 • 20.9k • 11 • 1

  • vilm/RedPajama-v2-small

    Viewer • Updated Jan 20, 2024 • 500k • 29 • 1

  • vilm/RedPajama-v2-xsmall

    Viewer • Updated Jan 20, 2024 • 250k • 22 • 1

  • vilm/the-stack-smol-xl-cleaned

    Viewer • Updated Jan 20, 2024 • 205k • 16 • 1

  • vilm/refinedweb-1m-medium

    Viewer • Updated Jan 20, 2024 • 1M • 30 • 2
Upvote
3
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs