Linoy Tsaban

linoyts

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture ๐ŸงจDiffusers's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Snap Research's profile picture Weizmann Institute of Science's profile picture Editing Images's profile picture leditsplusplus's profile picture Latent Consistency's profile picture Editing Audio's profile picture Women on Hugging Face's profile picture +RAIN film festival's profile picture diffusers-internal-dev's profile picture rnri-inversion's profile picture Snapchat Inc.'s profile picture Latent Explorers's profile picture open/ acc's profile picture RF Inversion's profile picture FlowEdit's profile picture CRINGE's profile picture Rรฉflexion IA's profile picture IP Composer's profile picture Inference Endpoints Images's profile picture

linoyts's activity

reacted to AdinaY's post with ๐Ÿ”ฅ 10 minutes ago
view post
Post
48
Kimi-Audio ๐Ÿš€๐ŸŽง an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
โœจ 7B
โœจ 13M+ hours of pretraining data
โœจ Novel hybrid input architecture
โœจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to luigi12345's post with ๐Ÿ”ฅ 4 days ago
view post
Post
2374
SkyReels-V2 INFINITE VIDEO๐Ÿ”ฅโ™พ๏ธ๐ŸŽฌ UNLIMITED duration video generation model by Skywork.

> โ€œFinally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.โ€™โ€™๐Ÿ˜ฎ

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

โœจ 1.3B & 14B
โœจ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to victor's post with ๐Ÿ‘ 4 days ago
view post
Post
2371
DIA TTS is just amazing - please share your funniest gens (here is mine) ๐Ÿ˜‚
nari-labs/Dia-1.6B
reacted to AdinaY's post with ๐Ÿ”ฅ 5 days ago
view post
Post
2759
MAGI-1 ๐Ÿช„ the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

โœจ 24B with Apache 2.0
โœจ Strong temporal consistency
โœจ Benchmark-topping performance
  • 1 reply
ยท
posted an update 6 days ago
reacted to fdaudens's post with ๐Ÿคฏ 18 days ago
view post
Post
4086
๐ŸŽจ Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.

@OmniSVG
  • 2 replies
ยท
reacted to ajibawa-2023's post with ๐Ÿ”ฅ 18 days ago
view post
Post
3926
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
ยท
reacted to AdinaY's post with ๐Ÿ”ฅ 24 days ago
reacted to seawolf2357's post with ๐Ÿ”ฅ 25 days ago
view post
Post
8179
๐ŸŽจ Ghibli-Style Image Generation with Multilingual Text Integration: FLUX.1 Hugging Face Edition ๐ŸŒโœจ

Hello creators! Today I'm introducing a special image generator that combines the beautiful aesthetics of Studio Ghibli with multilingual text integration! ๐Ÿ˜

seawolf2357/Ghibli-Multilingual-Text-rendering

โœจ Key Features

Ghibli-Style Image Generation - High-quality animation-style images based on FLUX.1
Multilingual Text Rendering - Support for Korean, Japanese, English, and all languages! ๐Ÿ‡ฐ๐Ÿ‡ท๐Ÿ‡ฏ๐Ÿ‡ต๐Ÿ‡ฌ๐Ÿ‡ง
Automatic Image Editing with Simple Prompts - Just input your desired text and you're done!
Two Stylistic Variations Provided - Get two different results from a single prompt
Full Hugging Face Spaces Support - Deploy and share instantly!

๐Ÿš€ How Does It Work?

Enter a prompt describing your desired image (e.g., "a cat sitting by the window")
Input the text you want to add (any language works!)
Select the text position, size, and color
Two different versions are automatically generated!

๐Ÿ’ฏ Advantages of This Model

No Tedious Post-Editing Needed - Text is perfectly integrated during generation
Natural Text Integration - Text automatically adjusts to match the image style
Perfect Multilingual Support - Any language renders beautifully!
User-Friendly Interface - Easily adjust text size, position, and color
One-Click Hugging Face Deployment - Use immediately without complex setup

๐ŸŽญ Use Cases

Creating multilingual greeting cards
Animation-style social media content
Ghibli-inspired posters or banners
Character images with dialogue in various languages
Sharing with the community through Hugging Face Spaces

This project leverages Hugging Face's FLUX.1 model to open new possibilities for seamlessly integrating high-quality Ghibli-style images with multilingual text using just prompts! ๐ŸŒˆ
Try it now and create your own artistic masterpieces! ๐ŸŽจโœจ

#GhibliStyle #MultilingualSupport #AIImageGeneration #TextRendering #FLUX #HuggingFace
ยท
reacted to ZhiyuanthePony's post with ๐Ÿค— 27 days ago
view post
Post
2582
๐ŸŽ‰ Thrilled to share our #CVPR2025 accepted work:
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data (2503.21694)

๐Ÿ”ฅ โ€‹Key Innovations:
1๏ธโƒฃ First to adapt SD for โ€‹direct textured mesh generation (1-2s inference)
2๏ธโƒฃ Novel teacher-student framework leveraging multi-view diffusion models ([MVDream](https://arxiv.org/abs/2308.16512) & [RichDreamer](https://arxiv.org/abs/2311.16918))
3๏ธโƒฃ โ€‹Parameter-efficient tuning - โ€‹only +2.6% params over base SD
4๏ธโƒฃ โ€‹3D data-free training liberates model from dataset constraints

๐Ÿ’ก Why matters?
โ†’ A novel โ€‹3D-Data-Free paradigm
โ†’ Outperforms data-driven methods on creative concept generation
โ†’ Unlocks web-scale text corpus for 3D content creation

๐ŸŒ Project: https://theericma.github.io/TriplaneTurbo/
๐ŸŽฎ Demo: ZhiyuanthePony/TriplaneTurbo
๐Ÿ’ป Code: https://github.com/theEricMa/TriplaneTurbo
reacted to prithivMLmods's post with ๐Ÿ‘ about 1 month ago
view post
Post
2629
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ ๐—ฐ๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. ๐Ÿงคโ˜„๏ธ

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers ๐Ÿค—.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

reacted to Yehor's post with ๐Ÿ‘ about 2 months ago
view post
Post
2878
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk
reacted to freddyaboulton's post with ๐Ÿš€ 2 months ago
view post
Post
3264
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc
reacted to burtenshaw's post with ๐Ÿ”ฅ 2 months ago
view post
Post
6422
Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

๐Ÿ”— Follow the org for updates agents-course

This week we are releasing the first framework unit in the course and itโ€™s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .
reacted to AdinaY's post with โค๏ธ 2 months ago
view post
Post
4241
๐Ÿš€ StepFun้˜ถ่ทƒๆ˜Ÿ่พฐ is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm ๐Ÿ”ฅbut many didnโ€™t know they were also building some amazing models. Now, theyโ€™ve just dropped something huge on the hub!

๐Ÿ“บ Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

๐Ÿ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
ยท
reacted to davidberenstein1957's post with ๐Ÿค— 3 months ago
reacted to not-lain's post with ๐Ÿ”ฅ 3 months ago
reacted to burtenshaw's post with ๐Ÿ”ฅ 3 months ago
view post
Post
48128
Weโ€™re launching a FREE and CERTIFIED course on Agents!

We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.

Here's what you'll learn:

- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience

This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.

Enroll today and start building the next generation of AI agent applications!

https://bit.ly/hf-learn-agents
ยท
reacted to alibabasglab's post with ๐Ÿ”ฅ 3 months ago
view post
Post
2616
We are thrilled to present the improved "ClearerVoice-Studio", an open-source platform designed to make speech processing easy use for everyone! Whether youโ€™re working on speech enhancement, speech separation, speech super-resolution, or target speaker extraction, this unified platform has you covered.

** Why Choose ClearerVoice-Studio?**

- Pre-Trained Models: Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch!
- Ease of Use: Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training.

**Where to Find Us?**

- GitHub Repository: ClearerVoice-Studio (https://github.com/modelscope/ClearerVoice-Studio)
- Try Our Demo: Hugging Face Space ( alibabasglab/ClearVoice)

**What Can You Do with ClearerVoice-Studio?**

- Enhance noisy speech recordings to achieve crystal-clear quality.
- Separate speech from complex audio mixtures with ease.
- Transform low-resolution audio into high-resolution audio. A full upscaled LJSpeech-1.1-48kHz dataset can be downloaded from alibabasglab/LJSpeech-1.1-48kHz .
- Extract target speaker voices with precision using audio-visual models.

**Join Us in Growing ClearerVoice-Studio!**

We believe in the power of open-source collaboration. By starring our GitHub repository and sharing ClearerVoice-Studio with your network, you can help us grow this community-driven platform.

**Support us by:**

- Starring it on GitHub.
- Exploring and contributing to our codebase .
- Sharing your feedback and use cases to make the platform even better.
- Joining our community discussions to exchange ideas and innovations.
- Together, letโ€™s push the boundaries of speech processing! Thank you for your support! :sparkling_heart:
reacted to Tonic's post with ๐Ÿ”ฅ 3 months ago
view post
Post
2476
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธHey there folks , Open LLM Europe just released Lucie 7B-Instruct model , a billingual instruct model trained on open data ! You can check out my unofficial demo here while we wait for the official inference api from the group : Tonic/Lucie-7B hope you like it ๐Ÿš€