Thank you for this post! Very useful and well explained. I didn't really understood the hype behind MCP, now it's a bit clearer!
Frederic Branchaud-Charron
Dref360
AI & ML interests
Bayesian deep learning, uncertainty estimation, and trustworthiness.
Recent Activity
commented on
an
article
about 23 hours ago
Tiny Agents: a MCP-powered agent in 50 lines of code
upvoted
an
article
about 23 hours ago
Tiny Agents: a MCP-powered agent in 50 lines of code
upvoted
a
paper
3 months ago
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Organizations
Dref360's activity

commented on
Tiny Agents: a MCP-powered agent in 50 lines of code
about 23 hours ago

upvoted
an
article
about 23 hours ago
Article
Tiny Agents: a MCP-powered agent in 50 lines of code
By
โข
โข
153
upvoted
a
paper
3 months ago

reacted to
m-ric's
post with ๐
3 months ago
Post
3390
Today we make the biggest release in smolagents so far: ๐๐ฒ ๐ฒ๐ป๐ฎ๐ฏ๐น๐ฒ ๐๐ถ๐๐ถ๐ผ๐ป ๐บ๐ผ๐ฑ๐ฒ๐น๐, ๐๐ต๐ถ๐ฐ๐ต ๐ฎ๐น๐น๐ผ๐๐ ๐๐ผ ๐ฏ๐๐ถ๐น๐ฑ ๐ฝ๐ผ๐๐ฒ๐ฟ๐ณ๐๐น ๐๐ฒ๐ฏ ๐ฏ๐ฟ๐ผ๐๐๐ถ๐ป๐ด ๐ฎ๐ด๐ฒ๐ป๐๐! ๐ฅณ
Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.
The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !
Go try it out, it's the most cracked agentic stuff I've seen in a while ๐คฏ (well, along with OpenAI's Operator who beat us by one day)
For more detail, read our announcement blog ๐ https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here ๐ https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py
Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.
The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !
Go try it out, it's the most cracked agentic stuff I've seen in a while ๐คฏ (well, along with OpenAI's Operator who beat us by one day)
For more detail, read our announcement blog ๐ https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here ๐ https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py

reacted to
merve's
post with ๐
3 months ago
Post
2306
smolagents can see ๐ฅ
we just shipped vision support to smolagents ๐ค agentic computers FTW
you can now:
๐ป let the agent get images dynamically (e.g. agentic web browser)
๐ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! ๐คฏ
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐ค
read our blog http://hf.co/blog/smolagents-can-see
we just shipped vision support to smolagents ๐ค agentic computers FTW
you can now:
๐ป let the agent get images dynamically (e.g. agentic web browser)
๐ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! ๐คฏ
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) ๐ค
read our blog http://hf.co/blog/smolagents-can-see

reacted to
merve's
post with โค๏ธ
3 months ago
Post
2635
Everything that happened this week in open AI, a recap ๐ค
merve/jan-17-releases-678a673a9de4a4675f215bf5
๐ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
๐ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐คฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐ง๐ปโโ๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
๐ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
๐ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
๐ Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
๐ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
๐ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐คฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐ง๐ปโโ๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
๐ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
๐ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
๐ Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm

reacted to
Sri-Vigneshwar-DJ's
post with โค๏ธ๐๐ฅ
4 months ago
Post
2352
Combining smolagents with Anthropicโs best practices simplifies building powerful AI agents:
1. Code-Based Agents: Write actions as Python code, reducing steps by 30%.
2. Prompt Chaining: Break tasks into sequential subtasks with validation gates.
3. Routing: Classify inputs and direct them to specialized handlers.
4. Fallback: Handle tasks even if classification fails.
https://huggingface.co/blog/Sri-Vigneshwar-DJ/building-effective-agents-with-anthropics-best-pra
1. Code-Based Agents: Write actions as Python code, reducing steps by 30%.
2. Prompt Chaining: Break tasks into sequential subtasks with validation gates.
3. Routing: Classify inputs and direct them to specialized handlers.
4. Fallback: Handle tasks even if classification fails.
https://huggingface.co/blog/Sri-Vigneshwar-DJ/building-effective-agents-with-anthropics-best-pra

upvoted
a
paper
4 months ago

upvoted
a
collection
4 months ago

reacted to
lewtun's
post with โค๏ธ๐ฅ
4 months ago
Post
6989
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐ฅ
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
๐ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
๐ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
๐งญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
๐ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
๐ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
๐งญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!

reacted to
burtenshaw's
post with โค๏ธ
5 months ago
Post
2807
For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. Itโs not big like Li Yin and
@mlabonne
, itโs just smol.
๐ท It focuses on practical use cases, so if youโre working on something, bring it along.
๐ฏโโ๏ธ Itโs peer reviewed and open so you can discuss and get feedback.
๐ค If youโre already a smol pro, feel free to drop a star or issue.
> > Part 1 starts now, and itโs on instruction tuning!
https://github.com/huggingface/smol-course
๐ท It focuses on practical use cases, so if youโre working on something, bring it along.
๐ฏโโ๏ธ Itโs peer reviewed and open so you can discuss and get feedback.
๐ค If youโre already a smol pro, feel free to drop a star or issue.
> > Part 1 starts now, and itโs on instruction tuning!
https://github.com/huggingface/smol-course

reacted to
andito's
post with ๐ฅโค๏ธ
5 months ago
Post
3390
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

reacted to
merve's
post with ๐๐ฅ
5 months ago
Post
3968
Small yet mighty! ๐ซ
We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐ค
We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39
Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO ๐
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO ๐
We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐ค
We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39
Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO ๐
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO ๐

posted
an
update
5 months ago
Post
1291
New week, new #cv Gradio app for human understanding.(
Dref360/human-interaction-demo) ๐ฅณ
This demo highlights when a person touches an object. For instance, it is useful to know if someone is touching a wall, a vase or a door. It works for multiple people too!
Still using nielsr/vitpose-base-simple for pose estimation, very excited to see the PR approved!
This demo highlights when a person touches an object. For instance, it is useful to know if someone is touching a wall, a vase or a door. It works for multiple people too!
Still using nielsr/vitpose-base-simple for pose estimation, very excited to see the PR approved!