7 71 28

Frank Sommers PRO

fsommers

fsommers

AI & ML interests

None yet

Recent Activity

upvoted a collection 2 days ago

D-FINE

upvoted a paper 10 days ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

upvoted a paper 16 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

View all activity

Organizations

fsommers's activity

upvoted a collection 2 days ago

D-FINE

Collection

State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated 3 days ago • 48

upvoted a paper 10 days ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published 13 days ago • 41

upvoted a paper 16 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

upvoted a paper 29 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

liked a model about 1 month ago

reducto/RolmOCR

Image-Text-to-Text • Updated Apr 2 • 115k • 397

upvoted an article about 1 month ago

Article

Tool Use, Unified

Aug 12, 2024

• 103

upvoted 2 papers about 1 month ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 19

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 15

liked a model about 1 month ago

Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • Updated 24 days ago • 382k • • 353

upvoted 2 papers about 2 months ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published Mar 19 • 48

Aligning Multimodal LLM with Human Preference: A Survey

Paper • 2503.14504 • Published Mar 18 • 23

upvoted a collection about 2 months ago

Gemma 3 Release

Collection

24 items • Updated 20 days ago • 357

upvoted a paper 2 months ago

NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering

Paper • 2502.10868 • Published Feb 15 • 2

updated a collection 2 months ago

Misc papers

Collection

14 items • Updated Mar 4

upvoted a paper 2 months ago

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Paper • 2502.18017 • Published Feb 25 • 20

upvoted 2 articles 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 243

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 161

upvoted a paper 2 months ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 132

upvoted 2 papers 3 months ago

Scalable Vision Language Model Training via High Quality Data Curation

Paper • 2501.05952 • Published Jan 10 • 2

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186