TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11 • 50
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy Sep 18, 2024 • 242
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 29 days ago • 107
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published about 1 month ago • 25
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 9 days ago • 310
BASS: Batched Attention-optimized Speculative Sampling Paper • 2404.15778 • Published Apr 24, 2024 • 11
ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published Apr 4, 2024 • 98
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA Paper • 2308.10204 • Published Aug 20, 2023 • 1
Microsoft Research Papers Collection #PapersToRead from Microsoft Research in the broad space of Generative AI, Multi-agent systems, responsible AI practices, LLM Ops, and language models • 20 items • Updated Jun 26, 2024 • 5
Papers Collection Large Language Model (LLM) and NLP related papers. • 269 items • Updated 1 day ago • 12
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 615
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 39