-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 102
Collections
Discover the best community collections!
Collections including paper arxiv:2503.20020
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 25 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 46 -
Humanoid Policy ~ Human Policy
Paper • 2503.13441 • Published -
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Paper • 2503.16408 • Published • 40 -
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Paper • 2503.19757 • Published • 50
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 30 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 86
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 26 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 21 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 30 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 122
-
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 32 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 56 -
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
Paper • 2403.12267 • Published -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 44
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 28 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19