Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 123
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 32
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published Mar 7 • 24
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published Mar 7 • 37
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published Mar 4 • 18
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published Mar 6 • 15
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Paper • 2503.05638 • Published Mar 7 • 19
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities Paper • 2503.05652 • Published Mar 7 • 11
An Empirical Study on Eliciting and Improving R1-like Reasoning Models Paper • 2503.04548 • Published Mar 6 • 8
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published Mar 7 • 8
LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding Paper • 2503.04359 • Published Mar 6 • 6
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published Mar 3 • 5
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles Paper • 2502.18968 • Published Feb 26 • 3
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM Paper • 2503.04504 • Published Mar 6 • 3
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 65
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published Mar 11 • 26
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Paper • 2503.07703 • Published Mar 10 • 36
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 71
Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation Paper • 2503.09427 • Published Mar 12 • 4
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published Mar 11 • 19
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval Paper • 2503.08644 • Published Mar 11 • 16
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Paper • 2503.07587 • Published Mar 10 • 11
^RFLAV: Rolling Flow matching for infinite Audio Video generation Paper • 2503.08307 • Published Mar 11 • 9
BiasEdit: Debiasing Stereotyped Language Models via Model Editing Paper • 2503.08588 • Published Mar 11 • 7
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models Paper • 2503.08417 • Published Mar 11 • 8
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published Mar 7 • 10
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents Paper • 2503.08684 • Published Mar 11 • 5
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG Paper • 2503.04388 • Published Mar 6 • 16
Quantizing Large Language Models for Code Generation: A Differentiated Replication Paper • 2503.07103 • Published Mar 10 • 8
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System Paper • 2503.09600 • Published Mar 12 • 4
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations? Paper • 2503.05333 • Published Mar 7 • 8
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models Paper • 2503.11224 • Published Mar 14 • 27
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 34
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation Paper • 2503.13070 • Published Mar 17 • 9
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published Mar 15 • 9
Pensez: Less Data, Better Reasoning -- Rethinking French LLM Paper • 2503.13661 • Published Mar 17 • 5
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving Paper • 2503.08683 • Published Mar 11 • 1
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 124
STEVE: AStep Verification Pipeline for Computer-use Agent Training Paper • 2503.12532 • Published Mar 16 • 15
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper • 2503.11227 • Published Mar 14 • 24
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19 • 10
ELTEX: A Framework for Domain-Driven Synthetic Data Generation Paper • 2503.15055 • Published Mar 19 • 6
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 48
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21 • 54
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper • 2503.16874 • Published Mar 21 • 44
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 62
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published Mar 27 • 77
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Paper • 2503.21729 • Published Mar 27 • 28
Exploring the Evolution of Physics Cognition in Video Generation: A Survey Paper • 2503.21765 • Published Mar 27 • 11
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation Paper • 2503.22675 • Published Mar 28 • 35
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL Paper • 2503.23157 • Published Mar 29 • 9
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 275
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 30
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 83
MM-IFEngine: Towards Multimodal Instruction Following Paper • 2504.07957 • Published 28 days ago • 34
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published 27 days ago • 27
WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published 22 days ago • 32
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 21 days ago • 21
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published 17 days ago • 63
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 16 days ago • 60
MR. Video: "MapReduce" is the Principle for Long Video Understanding Paper • 2504.16082 • Published 16 days ago • 6
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Paper • 2504.16030 • Published 16 days ago • 34
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published 15 days ago • 105