-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 18 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2205.05198
-
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Paper • 2304.11277 • Published • 1 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 2 -
Reducing Activation Recomputation in Large Transformer Models
Paper • 2205.05198 • Published -
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Paper • 1811.06965 • Published
-
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Paper • 2304.11277 • Published • 1 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 2 -
Reducing Activation Recomputation in Large Transformer Models
Paper • 2205.05198 • Published -
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Paper • 1811.06965 • Published
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 64 -
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 34