-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 21 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2402.01032
-
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 28 -
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Paper • 2502.20583 • Published • 13
-
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 33 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 43
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 22 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 57 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 21
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 51 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 26
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 19 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 24
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31