view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model Oct 29, 2024 • 55
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 58
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22, 2024 • 88
Running on CPU Upgrade 13k 13k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots