-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2411.10442
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 118 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 124
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 81 -
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • Updated • 27.8k • 54 -
OpenGVLab/InternVL2_5-38B-MPO
Image-Text-to-Text • Updated • 78.9k • 20 -
OpenGVLab/InternVL2_5-26B-MPO
Image-Text-to-Text • Updated • 23.7k • 14
-
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper • 2411.04905 • Published • 125 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 125 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 62 -
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 88
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 81 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 111 -
NVILA: Efficient Frontier Visual Language Models
Paper • 2412.04468 • Published • 60 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 134
-
Stream of Search (SoS): Learning to Search in Language
Paper • 2404.03683 • Published • 32 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 81 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 62 -
Hymba: A Hybrid-head Architecture for Small Language Models
Paper • 2411.13676 • Published • 45
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 57 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 81 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 71 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 9 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 25 -
GenXD: Generating Any 3D and 4D Scenes
Paper • 2411.02319 • Published • 20