Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ambroser53 's Collections
quantisation
Embed
LoRA
Vision
Commercial
Speech
active learning
Alignment
Embodiment
SSM
pretraining
RL
TTS
context

Vision

updated Jul 22, 2024
Upvote
-

  • InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

    Paper • 2401.13313 • Published Jan 24, 2024 • 5

  • BAAI/Bunny-v1_0-4B

    Text Generation • Updated Jun 24, 2024 • 21 • 9

  • What matters when building vision-language models?

    Paper • 2405.02246 • Published May 3, 2024 • 104

  • Jina CLIP: Your CLIP Model Is Also Your Text Retriever

    Paper • 2405.20204 • Published May 30, 2024 • 37

  • Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

    Paper • 2401.09417 • Published Jan 17, 2024 • 61

  • VoCo-LLaMA: Towards Vision Compression with Large Language Models

    Paper • 2406.12275 • Published Jun 18, 2024 • 32

  • PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Paper • 2406.13923 • Published Jun 20, 2024 • 23

  • Instruction Pre-Training: Language Models are Supervised Multitask Learners

    Paper • 2406.14491 • Published Jun 20, 2024 • 94

  • ColPali: Efficient Document Retrieval with Vision Language Models

    Paper • 2407.01449 • Published Jun 27, 2024 • 48

  • VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

    Paper • 2407.12594 • Published Jul 17, 2024 • 19
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs