-
microsoft/bitnet-b1.58-2B-4T
Text Generation • Updated • 84.8k • 948 -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • Updated • 3.64k • 26 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • Updated • 35.3k • 159 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 70
Collections
Discover the best community collections!
Collections including paper arxiv:2310.11453
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 78 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 18 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 84 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 615 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 102
-
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 13 -
Attention Is All You Need
Paper • 1706.03762 • Published • 61 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 615 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 102 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 106 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 102 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 80