Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.13923

Vision-language model series based on Qwen2.5

Running

104

104

Qwen2.5 VL 32B Instruct Demo

🏃

Chat with images and videos using Qwen
Running

233

233

Qwen2.5 VL 72B Instruct

💻

Chat with an AI that understands text and images
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186
Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • Updated 24 days ago • 382k • • 352

about 7 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 102

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 39
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 78
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186
Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 30
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 86

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 115
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 35
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published Feb 12 • 9

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 28 days ago • 27

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 144
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

Paper • 2502.14834 • Published Feb 20 • 24
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published Feb 24 • 53

Improved GUI Grounding via Iterative Narrowing

Paper • 2411.13591 • Published Nov 18, 2024
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs