Resurrecting Recurrent Neural Networks for Long Sequences Paper • 2303.06349 • Published Mar 11, 2023 • 1
Characterizing signal propagation to close the performance gap in unnormalized ResNets Paper • 2101.08692 • Published Jan 21, 2021 • 2
High-Performance Large-Scale Image Recognition Without Normalization Paper • 2102.06171 • Published Feb 11, 2021
On the Universality of Linear Recurrences Followed by Nonlinear Projections Paper • 2307.11888 • Published Jul 21, 2023 • 1
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29, 2024 • 57
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13, 2024 • 49
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11, 2024 • 48