Code Llama

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

codellama's activity

andrewrreed 
posted an update 4 months ago
view post
Post
2896
🚀 Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! 🤗

🔍 Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1⃣ One-click deployment: on Spaces with persistent storage and integrated OAuth
🛠 Simple Prompt Management: Version, edit, and update without redeployment
✅ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
📊 Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product they’re building! 👏 @marcklingen @Clemo @MJannik

🔗 Space: langfuse/langfuse-template-space
🔗 Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
·
Narsil 
posted an update 5 months ago
view post
Post
1593
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
loubnabnl 
posted an update 6 months ago
view post
Post
3638
Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit 🛠️ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?
andrewrreed 
posted an update 6 months ago
view post
Post
1067
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! 🚀

✨ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1️⃣ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2️⃣ Configure LLM tracing with the 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗔𝗣𝗜
3️⃣ Observe multi-agent application runs with the CrewAI integration

𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗶𝘀 𝗰𝗿𝘂𝗰𝗶𝗮𝗹 for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

🔗 Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
🔗 Phoenix docs: https://docs.arize.com/phoenix
ArthurZ 
posted an update 6 months ago
loubnabnl 
posted an update 11 months ago
view post
Post
5887
🍷 FineWeb technical report is out and so is 📚 FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarks such as MMLU, ARC, and OpenBookQA.

Technical report: HuggingFaceFW/blogpost-fineweb-v1
Dataset: HuggingFaceFW/fineweb-edu

We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.

You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.

Enjoy!
Narsil 
posted an update 12 months ago
andrewrreed 
posted an update about 1 year ago
view post
Post
2624
🔬 Open LLM Progress Tracker 🔬

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings 🤗

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends 🚀

🔗 Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
🔗 Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/
  • 2 replies
·
Narsil 
posted an update about 1 year ago