metadata
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)
Dia is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
It generates highly realistic dialogue directly from transcripts, with support for both spoken and nonverbal cues (e.g., (laughs)
, (sighs)
), and can be conditioned on audio for emotional tone or voice consistency.
Currently, Dia supports English and is optimized for GPU inference. This model is designed for research and educational purposes only.
π₯ Try It Out
- π₯οΈ ZeroGPU demo on Spaces
- π Comparison demos with ElevenLabs and Sesame CSM-1B
- π§ Try voice remixing and conversations with a larger version β join the waitlist
- π¬ Join the community on Discord
π§ Capabilities
- Multispeaker support using
[S1]
,[S2]
, etc. - Rich nonverbal cue synthesis:
(laughs)
,(clears throat)
,(gasps)
, etc. - Voice conditioning (via transcript + audio example)
- Outputs high-fidelity
.mp3
files directly from text
Example input:
[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
π Quickstart
Install via pip:
pip install git+https://github.com/nari-labs/dia.git
Launch the Gradio UI:
git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py
Or manually set up:
git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py
π Python Example
from dia.model import Dia
model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("output.mp3", output)
Coming soon: PyPI package and CLI support
π» Inference Performance (on RTX 4090)
Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
---|---|---|---|
bfloat16 | 2.1Γ | 1.5Γ | ~10GB |
float16 | 2.2Γ | 1.3Γ | ~10GB |
float32 | 1.0Γ | 0.9Γ | ~13GB |
CPU support and quantized version coming soon.
β οΈ Ethical Use
This model is for research and educational use only. Prohibited uses include:
- Impersonating individuals (e.g., cloning real voices without consent)
- Generating misleading or malicious content
- Illegal or harmful activities
Please use responsibly.
π License
Apache 2.0
See the LICENSE for details.
π οΈ Roadmap
- π§ Inference speed optimization
- πΎ CPU & quantized model support
- π¦ PyPI + CLI tools