Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
gair-prox 's Collections
ProX Dataset
ProX Refining Models
ProX Math Models
ProX General Models

ProX Dataset

updated Feb 14

a collection of pre-training corpora refined by ProX

Upvote
7

  • gair-prox/DCLM-pro

    Viewer • Updated Feb 15 • 366M • 5.24k • 10

  • gair-prox/FineWeb-pro

    Viewer • Updated Sep 26, 2024 • 63.1M • 597 • 24

  • gair-prox/open-web-math-pro

    Viewer • Updated Sep 26, 2024 • 2.58M • 472 • 11

  • gair-prox/RedPajama-pro

    Viewer • Updated Sep 26, 2024 • 10.2M • 336 • 4

  • gair-prox/c4-pro

    Viewer • Updated Sep 26, 2024 • 40.1M • 167 • 6

  • Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

    Paper • 2409.17115 • Published Sep 25, 2024 • 63
Upvote
7
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs