--- license: apache-2.0 pipeline_tag: text-to-speech tags: - model_hub_mixin - pytorch_model_hub_mixin --- # Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B) **Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs. It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency. Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only. --- ## 🔥 Try It Out - 🖥️ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B) - 📊 [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B - 🎧 Try voice remixing and conversations with a larger version — [join the waitlist](https://tally.so/r/meokbo) - 💬 [Join the community on Discord](https://discord.gg/pgdB5YRe) --- ## 🧠 Capabilities - Multispeaker support using `[S1]`, `[S2]`, etc. - Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc. - Voice conditioning (via transcript + audio example) - Outputs high-fidelity `.mp3` files directly from text Example input: ```text [S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs) ``` --- ## 🚀 Quickstart Install via pip: ```bash pip install git+https://github.com/nari-labs/dia.git ``` Launch the Gradio UI: ```bash git clone https://github.com/nari-labs/dia.git cd dia && uv run app.py ``` Or manually set up: ```bash git clone https://github.com/nari-labs/dia.git cd dia python -m venv .venv source .venv/bin/activate pip install -e . python app.py ``` --- ## 🐍 Python Example ```python from dia.model import Dia model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16") text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)" output = model.generate(text, use_torch_compile=True, verbose=True) model.save_audio("output.mp3", output) ``` > Coming soon: PyPI package and CLI support --- ## 💻 Inference Performance (on RTX 4090) | Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage | |-----------|------------------------------|-------------|------------| | bfloat16 | 2.1× | 1.5× | ~10GB | | float16 | 2.2× | 1.3× | ~10GB | | float32 | 1.0× | 0.9× | ~13GB | > CPU support and quantized version coming soon. --- ## ⚠️ Ethical Use This model is for **research and educational use only**. Prohibited uses include: - Impersonating individuals (e.g., cloning real voices without consent) - Generating misleading or malicious content - Illegal or harmful activities Please use responsibly. --- ## 📄 License Apache 2.0 See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details. --- ## 🛠️ Roadmap - 🔧 Inference speed optimization - 💾 CPU & quantized model support - 📦 PyPI + CLI tools