Voila Collection Voila: Voice-Language Foundation Models. https://voila.maitrix.org • 7 items • Updated 2 days ago • 14
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published 7 days ago • 48
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 8 days ago • 41
YoChameleon: Personalized Vision and Language Generation Paper • 2504.20998 • Published 9 days ago • 11
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published 9 days ago • 88
VideoVista-CulturalLingo: 360^circ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension Paper • 2504.17821 • Published 15 days ago • 21
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Paper • 2504.16656 • Published 15 days ago • 53
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 13 days ago • 41
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Paper • 2504.16427 • Published 15 days ago • 17
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 14 days ago • 29
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 14 days ago • 38
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published 14 days ago • 105
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 14 days ago • 54
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 14 days ago • 86
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 14 days ago • 23
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 16 days ago • 19