codellama (Code Llama)

🚀 Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! 🤗

🔍 Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1⃣ One-click deployment: on Spaces with persistent storage and integrated OAuth
🛠 Simple Prompt Management: Version, edit, and update without redeployment
✅ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
📊 Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product they’re building! 👏 @marcklingen @Clemo @MJannik

🔗 Space: langfuse/langfuse-template-space
🔗 Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse

1 reply

·

Narsil

posted an update 5 months ago

Post

1593

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

loubnabnl

posted an update 6 months ago

Post

3638

Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit 🛠️ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?

andrewrreed

posted an update 6 months ago

Post

1067

Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! 🚀

✨ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1️⃣ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2️⃣ Configure LLM tracing with the 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗔𝗣𝗜
3️⃣ Observe multi-agent application runs with the CrewAI integration

𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗶𝘀 𝗰𝗿𝘂𝗰𝗶𝗮𝗹 for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

🔗 Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
🔗 Phoenix docs: https://docs.arize.com/phoenix

ArthurZ

posted an update 6 months ago

Post

4057

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

lvwerra

authored a paper 6 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 25

lvwerra

authored a paper 11 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 97

loubnabnl

authored a paper 11 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 97

loubnabnl

posted an update 11 months ago

Post

5887

🍷 FineWeb technical report is out and so is 📚 FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarks such as MMLU, ARC, and OpenBookQA.

Technical report: HuggingFaceFW/blogpost-fineweb-v1
Dataset: HuggingFaceFW/fineweb-edu

We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.

You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.

Enjoy!

lvwerra

authored a paper 11 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

loubnabnl

authored a paper 11 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

Narsil

posted an update 12 months ago

Post

1980

text-generation-inference v2.0.3 is out.

Main new features:
- Falcon2 support
- PaliGemma support
- New faster speculation method from IBM !

https://github.com/huggingface/text-generation-inference/releases

andrewrreed

posted an update about 1 year ago

Post

2624

🔬 Open LLM Progress Tracker 🔬

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings 🤗

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends 🚀

🔗 Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
🔗 Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/

2 replies

·

Narsil

posted an update about 1 year ago

Post

1296

text-generation-inference 2.0.2 is out.

- Native support for Idefics2, with much better efficiency than llava 1.6 (next) !

Phi3, Increase VLM support in the openai layer.

Release notes https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

Code Llama

AI & ML interests

Recent Activity

codellama's activity

SmolVLM: Redefining small and efficient multimodal models

SmolVLM: Redefining small and efficient multimodal models

SmolVLM: Redefining small and efficient multimodal models

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Towards Best Practices for Open Datasets for LLM Training

SelfCodeAlign: Self-Alignment for Code Generation

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

AI & ML interests

Recent Activity

Team members 6

codellama's activity