6 153 187

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 4 days ago

Image / Video Gen

upvoted a paper 4 days ago

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

liked a model 6 days ago

facebook/EdgeTAM

View all activity

Organizations

Norm's activity

upvoted a paper 4 days ago

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Paper • 2505.00703 • Published 7 days ago • 39

upvoted a paper 7 days ago

The Leaderboard Illusion

Paper • 2504.20879 • Published 9 days ago • 66

upvoted 2 papers 11 days ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published 16 days ago • 34

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published 14 days ago • 11

upvoted a paper 12 days ago

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Paper • 2504.17789 • Published 14 days ago • 23

upvoted a paper 13 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 24 days ago • 255

upvoted a paper 14 days ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published 28 days ago • 125

upvoted 2 papers 21 days ago

Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 52

Seedream 3.0 Technical Report

Paper • 2504.11346 • Published 23 days ago • 54

upvoted 5 papers 2 months ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 63

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 144

upvoted 2 papers 3 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 60

upvoted a collection 3 months ago

Deepseek Papers

Collection

Deepseek papers collection • 20 items • Updated 6 days ago • 195

upvoted 3 papers 3 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published Feb 11 • 36

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4 • 65