Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
darknoon 's Collections
tailwindcss
tool use
text to image
vector graphics
rlhf
multimodal interesting
captioning

multimodal interesting

updated Oct 15, 2024
Upvote
-

  • MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

    Paper • 2406.18790 • Published Jun 26, 2024 • 35

  • OmniGen: Unified Image Generation

    Paper • 2409.11340 • Published Sep 17, 2024 • 115

  • Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

    Paper • 2408.12528 • Published Aug 22, 2024 • 52

  • MonoFormer/MonoFormer_ImageNet_256

    Updated Sep 25, 2024 • 3 • 5

  • A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

    Paper • 2410.01912 • Published Oct 2, 2024 • 14

  • Building and better understanding vision-language models: insights and future directions

    Paper • 2408.12637 • Published Aug 22, 2024 • 131

  • Aria: An Open Multimodal Native Mixture-of-Experts Model

    Paper • 2410.05993 • Published Oct 8, 2024 • 112

  • DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

    Paper • 2410.08159 • Published Oct 10, 2024 • 25
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs