Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2205.05198

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 18
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

LLM Optimization

A Survey on Efficient Inference for Large Language Models

Paper • 2404.14294 • Published Apr 22, 2024 • 2
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022

Distributed Training

Papers and resources related to distributed training.

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Paper • 2304.11277 • Published Apr 21, 2023 • 1
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Distributed Training Papers

Papers related to distributed training

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Paper • 2304.11277 • Published Apr 21, 2023 • 1
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

A little guide to building Large Language Models in 2024

Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 66
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26, 2024 • 4
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31, 2024 • 64
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 34

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs