-
Attention Is All You Need
Paper • 1706.03762 • Published • 61 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 39 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 58 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2307.09288
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 18 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
-
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 163 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 60
-
Mistral 7B
Paper • 2310.06825 • Published • 48 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 243 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 391
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 615 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 367 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 2.72M • • 10.1k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • Updated • 7.07M • • 2.35k -
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text • Updated • 597k • • 1.43k -
deepseek-ai/DeepSeek-V2.5
Text Generation • Updated • 1.69k • 706
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 367 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 147 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 186