Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published 17 days ago • 65
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88
Mosaic IT: Enhancing Instruction Tuning with Data Mosaics Paper • 2405.13326 • Published May 22, 2024 • 1
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning Paper • 2311.10774 • Published Nov 15, 2023 • 1
Mosaic IT: Enhancing Instruction Tuning with Data Mosaics Paper • 2405.13326 • Published May 22, 2024 • 1
Visual News: Benchmark and Challenges in News Image Captioning Paper • 2010.03743 • Published Oct 8, 2020
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents Paper • 2306.06306 • Published Jun 9, 2023
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 27 • 6
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 27 • 6
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 27 • 6