new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

May 28

Submitted by

QiushiSun

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

·
21 authors

1

Submitted by

JiakangYuan

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

·
11 authors

2

Submitted by

KevinQHLin

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

·
5 authors

1

Submitted by

yiren98

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

·
3 authors

1

Submitted by

BestWishYsh

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

·
9 authors

2

Submitted by

MiniMax-AI

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

·
15 authors

Submitted by

glebzok

Exploring the Latent Capacity of LLMs for One-Step Text Generation

·
2 authors

1

Submitted by

AmirhoseinGH

Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence

·
4 authors

1

Submitted by

hassid

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

·
4 authors

2

Submitted by

YunxinLi

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

·
8 authors

3

Submitted by

tgy2024

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

·
10 authors

Submitted by

AJZhou

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

·
15 authors

1

Submitted by

xihc-ucb

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

·
14 authors

1

Submitted by

DogNeverSleep

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

·
18 authors

1

Submitted by

HyungjunKim

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

·
5 authors

1

Submitted by

Howe666

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

·
6 authors

1

Submitted by

XUANMINGZHANG

MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

·
4 authors

3

Submitted by

lynazhang

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

·
8 authors

Submitted by

FSCCS

HoliTom: Holistic Token Merging for Fast Video Large Language Models

·
6 authors

Submitted by

che111

Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

·
9 authors

1

Submitted by

che111

NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

·
15 authors

Submitted by

Shimao-Zhang

How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

·
8 authors

1

Submitted by

BestWishYsh

ImgEdit: A Unified Image Editing Dataset and Benchmark

·
8 authors

2

Submitted by

Ningyu

Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms

·
7 authors

1

Submitted by

ariondas

SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

·
9 authors

2

Submitted by

lkevinzc

Reinforcing General Reasoning without Verifiers

·
9 authors

1

Submitted by

HikariDawn

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

·
4 authors

1

Submitted by

Z-MU-Z

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

·
11 authors

1

Submitted by

Geralt-Targaryen

Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

·
15 authors

1

Submitted by

tricktreat

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

·
12 authors

1

Submitted by

leo1117

DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

·
13 authors

1

Submitted by

judge

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

·
14 authors

Submitted by

joanrodai

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

·
15 authors

Submitted by

zzwustc

MotionPro: A Precise Motion Controller for Image-to-Video Generation

·
7 authors

2

Submitted by

yushihu

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

·
12 authors

1

Submitted by

jiaxiaojunQAQ

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

·
10 authors

1

Submitted by

wydu

Thinker: Learning to Think Fast and Slow

·
3 authors

Submitted by

Nickyang

Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

·
2 authors

1

Submitted by

YanAdjeNole

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

·
10 authors

1

Submitted by

yunlong10

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

·
14 authors

Submitted by

pprp

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

·
6 authors

Submitted by

ofirpress

VideoGameBench: Can Vision-Language Models complete popular video games?

·
4 authors

Submitted by

yrshi

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

·
8 authors

1

Submitted by

adamdad

Minute-Long Videos with Dual Parallelisms

·
5 authors

Submitted by

ZhangShenao

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

·
8 authors

1

Submitted by

yjlee0222

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

·
7 authors

Submitted by

westbrook

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

·
10 authors

1

Submitted by

mkoretsky1

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

·
11 authors

1

Submitted by

EliverQ

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

·
10 authors

1

Submitted by

zhennan1

Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

·
7 authors

1

Submitted by

BestWishYsh

Sci-Fi: Symmetric Constraint for Frame Inbetweening

·
8 authors

1

Submitted by

Neo111x

DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

·
6 authors

1

Submitted by

Chenfei-Liao

MLLMs are Deeply Affected by Modality Bias

·
18 authors

1

Submitted by

NicerWang

AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

·
8 authors

1

Submitted by

ChrisJuan

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

·
18 authors

1

Submitted by

friedrichor

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

·
11 authors

1

Submitted by

Xxlbigbrother

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

·
8 authors

2

Submitted by

HuanjinYao

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

·
11 authors

1

Submitted by

Ningyu

Spatial Knowledge Graph-Guided Multimodal Synthesis

·
6 authors

Submitted by

xianghuang

Reverse Preference Optimization for Complex Instruction Following

·
8 authors

Submitted by

davidelobba

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

·
6 authors

1

Submitted by

nielsr

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

·
17 authors

Submitted by

kotekjedi

Capability-Based Scaling Laws for LLM Red-Teaming

·
4 authors

1

Submitted by

cr8br0ze

Absolute Coordinates Make Motion Generation Easy

·
5 authors

1

Submitted by

Qinsi1

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

·
9 authors

Submitted by

justairr

SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

·
4 authors

1

Submitted by

jpd459

Explaining Sources of Uncertainty in Automated Fact-Checking

·
4 authors

Submitted by

Baran47

Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms

·
4 authors

1

Submitted by

yunhuijang

Improving Chemical Understanding of LLMs via SMILES Parsing

·
3 authors

Submitted by

florin-hf

Do RAG Systems Suffer From Positional Bias?

·
5 authors

1

Submitted by

jinheon

Knowledge Base Construction for Knowledge-Augmented Text-to-SQL

·
7 authors

Submitted by

aluo-x

Vision Transformers with Self-Distilled Registers

·
5 authors

1

Submitted by

hazemessam

Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations

·
4 authors

1

Submitted by

hazemessam

Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction

·
8 authors

1

Submitted by

andrewzamai

An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

·
6 authors

1

Submitted by

Eleven-P

PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

·
8 authors

1