Awario (Awario )

liked a model 13 days ago

Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0

Text-to-Image • Updated 14 days ago • 35.1k • 249

liked 2 Spaces about 1 month ago

13

ControlNet v1.1 Annotators cpu

📊

Generate control images for Stable Diffusion using various detectors

25

ImageProcessor

🐢

Turn images into annotated images with various effects

liked a model about 1 month ago

lllyasviel/Annotators

Updated Aug 27, 2023 • 343

liked a Space about 1 month ago

3

Anilines

⚡

Anime Line Extractor

liked a model about 2 months ago

tencent/Hunyuan3D-2mv

Image-to-3D • Updated Mar 19 • 9.17k • 374

liked a model 2 months ago

ali-vilab/ACE_Plus

Updated Mar 14 • 149 • 225

liked a dataset 2 months ago

GeneralReasoning/GeneralThought-195K

Viewer • Updated Mar 10 • 195k • 374 • 69

upvoted an article 2 months ago

Article

The AI tools for Art Newsletter - Issue 1

Jan 31

• 77

reacted to alibabasglab's post with 👍 4 months ago

Post

2617

We are thrilled to present the improved "ClearerVoice-Studio", an open-source platform designed to make speech processing easy use for everyone! Whether you’re working on speech enhancement, speech separation, speech super-resolution, or target speaker extraction, this unified platform has you covered.

** Why Choose ClearerVoice-Studio?**

- Pre-Trained Models: Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch!
- Ease of Use: Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training.

**Where to Find Us?**

- GitHub Repository: ClearerVoice-Studio (https://github.com/modelscope/ClearerVoice-Studio)
- Try Our Demo: Hugging Face Space ( alibabasglab/ClearVoice)

**What Can You Do with ClearerVoice-Studio?**

- Enhance noisy speech recordings to achieve crystal-clear quality.
- Separate speech from complex audio mixtures with ease.
- Transform low-resolution audio into high-resolution audio. A full upscaled LJSpeech-1.1-48kHz dataset can be downloaded from alibabasglab/LJSpeech-1.1-48kHz .
- Extract target speaker voices with precision using audio-visual models.

**Join Us in Growing ClearerVoice-Studio!**

We believe in the power of open-source collaboration. By starring our GitHub repository and sharing ClearerVoice-Studio with your network, you can help us grow this community-driven platform.

**Support us by:**

- Starring it on GitHub.
- Exploring and contributing to our codebase .
- Sharing your feedback and use cases to make the platform even better.
- Joining our community discussions to exchange ideas and innovations.
- Together, let’s push the boundaries of speech processing! Thank you for your support! :sparkling_heart:

reacted to sanaka87's post with 🔥 4 months ago

Post

1813

🚀 Excited to Share Our Latest Work: 3DIS & 3DIS-FLUX for Multi-Instance Layout-to-Image Generation! ❤️❤️❤️

🎨 Daily Paper: 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering (2501.05131)
🔓 Code is now open source!
🌐 Project Website: https://limuloo.github.io/3DIS/
🏠 GitHub Repository: https://github.com/limuloo/3DIS
📄 3DIS Paper: https://arxiv.org/abs/2410.12669
📄 3DIS-FLUX Tech Report: https://arxiv.org/abs/2501.05131

🔥 Why 3DIS & 3DIS-FLUX?
Current SOTA multi-instance generation methods are typically adapter-based, requiring additional control modules trained on pre-trained models for layout and instance attribute control. However, with the emergence of more powerful models like FLUX and SD3.5, these methods demand constant retraining and extensive resources.

✨ Our Solution: 3DIS
We introduce a decoupled approach that only requires training a low-resolution Layout-to-Depth model to convert layouts into coarse-grained scene depth maps. Leveraging community and company pre-trained models like ControlNet + SAM2, we enable training-free controllable image generation on high-resolution models such as SDXL and FLUX.

🌟 Benefits of Our Decoupled Multi-Instance Generation:
1. Enhanced Control: By constructing scenes using depth maps in the first stage, the model focuses on coarse-grained scene layout, improving control over instance placement.
2. Flexibility & Preservation: The second stage employs training-free rendering methods, allowing seamless integration with various models (e.g., fine-tuned weights, LoRA) while maintaining the generative capabilities of pre-trained models.

Join us in advancing Layout-to-Image Generation! Follow and star our repository to stay updated! ⭐

upvoted a paper 4 months ago

TransPixar: Advancing Text-to-Video Generation with Transparency

Paper • 2501.03006 • Published Jan 6 • 27

liked 4 models 4 months ago

reacted to MonsterMMORPG's post with 👍 4 months ago

Post

1814

Famous IC-Light - Relight Images - Advanced Gradio APP with Windows, RunPod, Massed Compute and Free Kaggle Account Installers Published

Installers are shared here : https://www.patreon.com/posts/famous-ic-light-119566071

1-Click to install and use on Windows, RunPod, Massed Compute and a free Kaggle account notebook

All working perfect with more advanced Gradio app than what was officially published on official repo : https://github.com/lllyasviel/IC-Light

Moreover,

Started another experimental product training for a client. Doing FLUX Dreambooth / Finetuning via Kohya SS GUI. GPU is L40S and batch size is 7. Config name : Batch_Size_7_48GB_GPU_46250MB_29.1_second_it_Tier_1.json

Full workflow, step by step tutorial and configs : https://youtu.be/FvpWy1x5etM

Check out the attached images in full resolution fore more info