Merve Noyan's picture

Merve Noyan PRO

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 3 days ago

Don't sleep on new AI at Meta Vision-Language release! 🔥 https://huggingface.co/collections/facebook/perception-encoder-67f977c9a65ca5895a7f6ba1 https://huggingface.co/collections/facebook/perception-lm-67f9783f171948c383ee7498 Meta dropped swiss army knives for vision with A2.0 license 👏 > image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏 > The vision LM outperforms InternVL3 and Qwen2.5VL 👏 > They also release gigantic video and image datasets The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks. They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏 > Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮 > Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!) The authors release the following checkpoints in sizes base, large and giant: > 3 PE-Core checkpoints (224, 336, 448) > 2 PE-Lang checkpoints (L, G) > One PE-Spatial (G, 448) > 3 PLM (1B, 3B, 8B) > Datasets Authors release following datasets 📑 > PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️ > PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks > PLM-VideoBench: New video benchmark on MCQA

upvoted a paper 3 days ago

Perception Encoder: The best visual embeddings are not at the output of the network

upvoted a collection 4 days ago

Perception Encoder

View all activity

Organizations

merve's activity

published an article 12 days ago

Article

Cohere on Hugging Face Inference Providers 🔥

By

and 6 others •

12 days ago

• 124

published an article about 2 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

By

and 3 others •

Mar 12

• 401

published an article 2 months ago

Article

SigLIP 2: A better multilingual vision language encoder

By

and 2 others •

Feb 21

• 154

published an article 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

By

and 6 others •

Feb 20

• 238

published an article 3 months ago

Article

Open-source DeepResearch – Freeing our search agents

By

and 4 others •

Feb 4

• 1.23k

published an article 3 months ago

Article

We now support VLMs in smolagents!

By

and 2 others •

Jan 24

• 100

published an article 3 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

By

and 2 others •

Jan 23

• 173

published an article 4 months ago

Article

Introducing smolagents: simple agents that write actions in code.

By

and 2 others •

Dec 31, 2024

• 997

published an article 5 months ago

Article

Welcome PaliGemma 2 – New vision language models by Google

By

and 3 others •

Dec 5, 2024

• 152

published an article 5 months ago

Article

SmolVLM - small yet mighty Vision Language Model

By

and 4 others •

Nov 26, 2024

• 246

published an article 7 months ago

Article

Llama can now see and run on your device - welcome Llama 3.2

By

and 6 others •

Sep 25, 2024

• 188

published an article 10 months ago

Article

Preference Optimization for Vision Language Models

By

and 3 others •

Jul 10, 2024

• 68

published an article 10 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

By

and 2 others •

Jun 24, 2024

• 193

published an article 12 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

By

and 2 others •

May 14, 2024

• 247

published an article about 1 year ago

Article

Vision Language Models Explained

By

and 1 other •

Apr 11, 2024

• 315

published an article about 1 year ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

By

and 2 others •

Feb 19

• 69

published an article over 1 year ago

Article

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

By

•

Aug 25, 2023

• 31

published an article over 1 year ago

Article

Deploy MusicGen in no time with Inference Endpoints

By

and 1 other •

Aug 4, 2023

• 4

published an article almost 2 years ago

Article

Open-Source Text Generation & LLM Ecosystem at Hugging Face

By

•

Jul 17, 2023

• 2

published an article about 2 years ago

Article

Jupyter X Hugging Face

By

and 2 others •

Mar 23, 2023

• 2