new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Apr 14

Submitted by

roadjiang

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

·
54 authors

Submitted by

YuuTennYi

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

·
5 authors

2

Submitted by

BestWishYsh

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft

·
7 authors

Submitted by

tianchez

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

·
12 authors

Submitted by

ZhuangXialie

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

·
6 authors

Submitted by

ShoufaChen

PixelFlow: Pixel-Space Generative Models with Flow

·
5 authors

Submitted by

yeates

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

·
10 authors

2

Submitted by

BestWishYsh

FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

·
4 authors

Submitted by

akhaliq

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

·
7 authors

Submitted by

DannyLan

Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models

·
4 authors

6

Submitted by

stefan-it

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

·
3 authors

Submitted by

sauradip

In-2-4D: Inbetweening from Two Single-View Images to 4D Generation

·
4 authors

Submitted by

AdinaY

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

·
52 authors

Submitted by

aashiqmuhamed

CoRAG: Collaborative Retrieval-Augmented Generation

·
3 authors

Submitted by

jialuliluka

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

·
6 authors

Submitted by

nielsr

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

·
3 authors

Submitted by

richard-guyunqi

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

·
5 authors

2

Submitted by

gabrielelozupone98

Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging

·
6 authors

2

Submitted by

ruipeterpan

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

·
6 authors

2

Submitted by

saidwivedi

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

·
7 authors

2

Submitted by

aashiqmuhamed

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

·
4 authors