Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
oguzhanercan 's Collections
Scene Generation
Training Theory
Image-Text Alignment
Efficent ML
Control Based Video Generation Models
Video Generation Backbone Models
Video Generation Style Models
Image-Video General Tasks
Generation Quality Enhancement
Diffusion/Flow Model Optimization
Voice
Datasets
Mobile Generative Models
Video Generation Control-Style Transfer
Diffusion-Score-Flow Guidance
Image Restoration (SR , Inpainting etc.)
General Theory
Image-Video MultiModal Understanding
Face Generation-Swap-Contol-Edit
Architectural Proposals
Generative Modeling Approachs
Image Editting
Video Generation
Diffusion Model Control
Image Generation

Voice

updated 3 days ago
Upvote
-

  • Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

    Paper • 2412.15322 • Published Dec 19, 2024 • 18

  • Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

    Paper • 2505.02707 • Published 3 days ago • 70

  • LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

    Paper • 2505.02625 • Published 3 days ago • 16
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs