Dia-1.6B / README.md
clem's picture
clem HF Staff
add to the readme
bd25172 verified
|
raw
history blame
3.16 kB
metadata
license: apache-2.0
pipeline_tag: text-to-speech
tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin

Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)

Dia is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
It generates highly realistic dialogue directly from transcripts, with support for both spoken and nonverbal cues (e.g., (laughs), (sighs)), and can be conditioned on audio for emotional tone or voice consistency.

Currently, Dia supports English and is optimized for GPU inference. This model is designed for research and educational purposes only.


πŸ”₯ Try It Out


🧠 Capabilities

  • Multispeaker support using [S1], [S2], etc.
  • Rich nonverbal cue synthesis: (laughs), (clears throat), (gasps), etc.
  • Voice conditioning (via transcript + audio example)
  • Outputs high-fidelity .mp3 files directly from text

Example input:

[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)

πŸš€ Quickstart

Install via pip:

pip install git+https://github.com/nari-labs/dia.git

Launch the Gradio UI:

git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py

Or manually set up:

git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py

🐍 Python Example

from dia.model import Dia

model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")

text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("output.mp3", output)

Coming soon: PyPI package and CLI support


πŸ’» Inference Performance (on RTX 4090)

Precision Realtime Factor (w/ compile) w/o Compile VRAM Usage
bfloat16 2.1Γ— 1.5Γ— ~10GB
float16 2.2Γ— 1.3Γ— ~10GB
float32 1.0Γ— 0.9Γ— ~13GB

CPU support and quantized version coming soon.


⚠️ Ethical Use

This model is for research and educational use only. Prohibited uses include:

  • Impersonating individuals (e.g., cloning real voices without consent)
  • Generating misleading or malicious content
  • Illegal or harmful activities

Please use responsibly.


πŸ“„ License

Apache 2.0
See the LICENSE for details.


πŸ› οΈ Roadmap

  • πŸ”§ Inference speed optimization
  • πŸ’Ύ CPU & quantized model support
  • πŸ“¦ PyPI + CLI tools