MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Paper โข 2504.13835 โข Published 19 days ago โข 36
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper โข 2504.10479 โข Published 23 days ago โข 255
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper โข 2503.24388 โข Published Mar 31 โข 30
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper โข 2503.14478 โข Published Mar 18 โข 47
Running 106 106 Open VLM Video Leaderboard ๐ VLMEvalKit Eval Results in video understanding benchmark
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper โข 2407.01494 โข Published Jul 1, 2024 โข 15
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models Paper โข 2312.13964 โข Published Dec 21, 2023 โข 20