New Papers - a ThreeSR Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

ThreeSR 's Collections

New Papers

updated 7 days ago

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 30
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 86
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10 • 44
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Paper • 2503.08625 • Published Mar 11 • 26
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Paper • 2503.03734 • Published Mar 5 • 1
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published Mar 13 • 28
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27 • 22
ViLBench: A Suite for Vision-Language Process Reward Modeling

Paper • 2503.20271 • Published Mar 26 • 7
Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 150
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25 • 50
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Paper • 2503.02268 • Published Mar 4 • 11
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 123
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17 • 16
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14 • 20
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 36
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 46
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17 • 30
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 275
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

Paper • 2504.00072 • Published Mar 31 • 7
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 87
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Paper • 2502.18906 • Published Feb 26 • 12
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186
Agentic Knowledgeable Self-awareness

Paper • 2504.03553 • Published Apr 4 • 28
Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published Apr 2 • 8
Kimi-VL Technical Report

Paper • 2504.07491 • Published 29 days ago • 125
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Paper • 2504.07934 • Published 28 days ago • 18
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published 29 days ago • 73
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published Apr 7 • 16
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published 30 days ago • 11
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Paper • 2504.05520 • Published Apr 7 • 10
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Paper • 2503.22738 • Published Mar 26 • 16
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Paper • 2504.03601 • Published Apr 4 • 16
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Paper • 2504.03561 • Published Apr 4 • 18
Towards Trustworthy GUI Agents: A Survey

Paper • 2503.23434 • Published Mar 30 • 21
Sleep-time Compute: Beyond Inference Scaling at Test-time

Paper • 2504.13171 • Published 21 days ago • 15
BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published 22 days ago • 70
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 24 days ago • 255
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published 27 days ago • 27
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Paper • 2504.09641 • Published 25 days ago • 16
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published 24 days ago • 11
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Paper • 2504.07891 • Published 28 days ago • 5
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published 23 days ago • 60
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published 28 days ago • 28
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper • 2504.16656 • Published 16 days ago • 54
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published 15 days ago • 106
Process Reward Models That Think

Paper • 2504.16828 • Published 15 days ago • 16
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published 16 days ago • 60
Progent: Programmable Privilege Control for LLM Agents

Paper • 2504.11703 • Published 23 days ago • 7
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 16 days ago • 102
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 18 days ago • 80
ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published 22 days ago • 43
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

Paper • 2504.13805 • Published 20 days ago • 12

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs