view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 ⢠401
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 ⢠72
view article Article From cloud to developers: Hugging Face and Microsoft Deepen Collaboration May 21, 2024 ⢠8
view article Article Unlocking Longer Generation with Key-Value Cache Quantization May 16, 2024 ⢠48
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper ⢠2309.03883 ⢠Published Sep 7, 2023 ⢠35