new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

May 8

Submitted by

runninglsy

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

·
10 authors

Submitted by

SpaceProduct

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

·
9 authors

Submitted by

Gracjan

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

·
6 authors

1

Submitted by

albertge

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

·
10 authors

Submitted by

BestWishYsh

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

·
7 authors

Submitted by

hyz317

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

·
8 authors

1

Submitted by

6cf

Benchmarking LLMs' Swarm intelligence

·
4 authors

Submitted by

renqiux0302

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

·
6 authors

Submitted by

itaowe

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution

·
10 authors

1

Submitted by

huangsiteng

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

·
13 authors

Submitted by

PahaII

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

·
5 authors

Submitted by

mariya-davydova

OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

·
5 authors

1

Submitted by

Ningyu

Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey

·
9 authors

Submitted by

VityaVitalich

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

·
9 authors

Submitted by

Tournesol-Saturday

RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

·
7 authors

1

Submitted by

linxule

Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation

·
1 authors

1

Submitted by

ProKil

AutoLibra: Agent Metric Induction from Open-Ended Feedback

·
6 authors

Submitted by

Eavn

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

·
3 authors

1

Submitted by

MilaWang

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

·
3 authors