An NLP-Driven Approach Using Twitter Data for Tailored K-pop Artist Recommendations Paper • 2503.21189 • Published Mar 27 • 1
Understanding and controlling the geometry of memory organization in RNNs Paper • 2502.07256 • Published Feb 11 • 1
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context Paper • 2407.09375 • Published Jul 12, 2024 • 1
Sliding Window Attention Training for Efficient Large Language Models Paper • 2502.18845 • Published Feb 26 • 1 • 1
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures Paper • 2504.20922 • Published 9 days ago • 1
MonoByte: A Pool of Monolingual Byte-level Language Models Paper • 2209.11035 • Published Sep 22, 2022 • 1
vGamba: Attentive State Space Bottleneck for efficient Long-range Dependencies in Visual Recognition Paper • 2503.21262 • Published Mar 27 • 1
Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published Jan 17 • 1 • 3
Thinking Machines: A Survey of LLM based Reasoning Strategies Paper • 2503.10814 • Published Mar 13 • 1
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Paper • 2504.12216 • Published 22 days ago • 1