Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zerozeyi 's Collections
Text-to-images
LLM
Text-to-videos
VisionLM
3D
AudioLLM

AudioLLM

updated Jul 29, 2024
Upvote
2

  • GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

    Paper • 2406.11768 • Published Jun 17, 2024 • 20

  • Investigating Decoder-only Large Language Models for Speech-to-text Translation

    Paper • 2407.03169 • Published Jul 3, 2024 • 11

  • PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

    Paper • 2407.02869 • Published Jul 3, 2024 • 21

  • FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Paper • 2407.04051 • Published Jul 4, 2024 • 40

  • Stable Audio Open

    Paper • 2407.14358 • Published Jul 19, 2024 • 27
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs