D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated 3 days ago • 48
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 13 days ago • 41
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16, 2024 • 33
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18 • 19
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 15
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering Paper • 2502.10868 • Published Feb 15 • 2
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 20
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 2