Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.20020

about 14 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 102

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25

Multimodal Action

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 46
Humanoid Policy ~ Human Policy

Paper • 2503.13441 • Published Mar 17
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Paper • 2503.16408 • Published Mar 20 • 40
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25 • 50

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 30
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 86

FM_Training_Infra

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 70
Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25
Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 26
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Paper • 2407.10973 • Published Jul 15, 2024 • 11
Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23, 2024 • 6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Paper • 2408.11048 • Published Aug 20, 2024 • 4

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 21
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 30
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 122

about 1 month ago

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 32
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 56
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

Paper • 2403.12267 • Published Mar 18, 2024
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 44

LM Capabilities and Scaling

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 28
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 23
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 38
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1, 2024 • 19

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs