Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.16867

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26, 2024 • 47
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 19
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 61
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 18
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Research Papers - To read

Collection of research papers

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 35
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 3
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 257

Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 58
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 75
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Paper • 2311.00871 • Published Nov 1, 2023 • 2

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 61
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 48

The Falcon Series of Open Language Models

Paper • 2311.16867 • Published Nov 28, 2023 • 13
tiiuae/falcon-180B

Text Generation • Updated Sep 6, 2023 • 2.11k • 1.14k
tiiuae/falcon-180B-chat

Text Generation • Updated Nov 7, 2023 • 1.22k • 544
tiiuae/falcon-40b

Text Generation • Updated Aug 9, 2024 • 125k • 2.43k

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs