-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper ⢠2211.04325 ⢠Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠18 -
On the Opportunities and Risks of Foundation Models
Paper ⢠2108.07258 ⢠Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper ⢠2204.07705 ⢠Published ⢠1
Collections
Discover the best community collections!
Collections including paper arxiv:2308.12950
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper ⢠2401.04577 ⢠Published ⢠44 -
Code Llama: Open Foundation Models for Code
Paper ⢠2308.12950 ⢠Published ⢠26 -
Simple and Controllable Music Generation
Paper ⢠2306.05284 ⢠Published ⢠153 -
High Fidelity Neural Audio Compression
Paper ⢠2210.13438 ⢠Published ⢠4
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper ⢠2403.03163 ⢠Published ⢠98 -
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Paper ⢠2403.02545 ⢠Published ⢠17 -
StarCoder: may the source be with you!
Paper ⢠2305.06161 ⢠Published ⢠31 -
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Paper ⢠2308.10462 ⢠Published ⢠2
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper ⢠2402.10790 ⢠Published ⢠43 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper ⢠2402.10524 ⢠Published ⢠24 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper ⢠2410.16256 ⢠Published ⢠61 -
Code Llama: Open Foundation Models for Code
Paper ⢠2308.12950 ⢠Published ⢠26
-
StarCoder: may the source be with you!
Paper ⢠2305.06161 ⢠Published ⢠31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper ⢠2306.08568 ⢠Published ⢠28 -
SantaCoder: don't reach for the stars!
Paper ⢠2301.03988 ⢠Published ⢠7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper ⢠2401.14196 ⢠Published ⢠63
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠61 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠18 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ⢠1907.11692 ⢠Published ⢠7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ⢠1910.01108 ⢠Published ⢠14
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper ⢠2311.12793 ⢠Published ⢠18 -
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper ⢠2311.12198 ⢠Published ⢠22 -
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Paper ⢠2311.18775 ⢠Published ⢠6 -
Code Llama: Open Foundation Models for Code
Paper ⢠2308.12950 ⢠Published ⢠26
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠61 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠8 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠159 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠48