aimagelab/LLaVA_MORE-llama_3_1-8B-S2-siglip-finetuning Image-Text-to-Text • Updated 14 days ago • 24 • 2
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning Image-Text-to-Text • Updated 14 days ago • 36 • 1
ReflectiVA Collection Models and data for ReflectiVA: Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [CVPR 2025] • 3 items • Updated Apr 5
ReT Collection Models and data for ReT: Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval [CVPR 2025] • 6 items • Updated Mar 29