Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.01032

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 21
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 82
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 25
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Interesting Papers

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 25
Craw4LLM: Efficient Web Crawling for LLM Pretraining

Paper • 2502.13347 • Published Feb 19 • 28
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Paper • 2502.20583 • Published Feb 27 • 13

State Space Models

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 25
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 33
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 82
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16, 2024 • 43

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 60
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 22
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 21

ParadigmShift-Inquiry

Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11, 2024 • 38
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 25
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 51
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 25
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 26

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 19
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4, 2024 • 39
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 55
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 24

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 54
Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 42
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 54
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs