Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
johannhartmann 's Collections
GUI Intelligence
Document & UI Intelligence
Multimodal Models
Medical MultiModal

Document & UI Intelligence

updated Jan 20
Upvote
1

  • xlangai/Aguvis-7B-720P

    Updated Jan 7 • 409 • 7

  • Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

    Paper • 2412.04454 • Published Dec 5, 2024 • 66

  • SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

    Paper • 2401.10935 • Published Jan 17, 2024 • 5

  • cckevinn/SeeClick

    Text Generation • Updated Jan 29, 2024 • 8.87k • 16

  • jadechoghari/Ferret-UI-Llama8b

    Image-Text-to-Text • Updated Jan 8 • 257 • 69

  • Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

    Paper • 2410.18967 • Published Oct 24, 2024 • 1

  • microsoft/OmniParser

    Image-Text-to-Text • Updated Dec 2, 2024 • 916 • 1.66k

  • InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

    Paper • 2501.04575 • Published Jan 8 • 24

  • showlab/ShowUI-2B

    Updated Mar 11 • 10.2k • 250

  • AskUI/PTA-1

    Image-Text-to-Text • Updated Nov 28, 2024 • 16.8k • 89
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs