|
# 🐢 Tortoise |
|
Tortoise is a very expressive TTS system with impressive voice cloning capabilities. It is based on an GPT like autogressive acoustic model that converts input |
|
text to discritized acouistic tokens, a diffusion model that converts these tokens to melspeectrogram frames and a Univnet vocoder to convert the spectrograms to |
|
the final audio signal. The important downside is that Tortoise is very slow compared to the parallel TTS models like VITS. |
|
|
|
Big thanks to 👑[@manmay-nakhashi](https://github.com/manmay-nakhashi) who helped us implement Tortoise in 🐸TTS. |
|
|
|
Example use: |
|
|
|
```python |
|
from TTS.tts.configs.tortoise_config import TortoiseConfig |
|
from TTS.tts.models.tortoise import Tortoise |
|
|
|
config = TortoiseConfig() |
|
model = Tortoise.init_from_config(config) |
|
model.load_checkpoint(config, checkpoint_dir="paths/to/models_dir/", eval=True) |
|
|
|
# with random speaker |
|
output_dict = model.synthesize(text, config, speaker_id="random", extra_voice_dirs=None, **kwargs) |
|
|
|
# cloning a speaker |
|
output_dict = model.synthesize(text, config, speaker_id="speaker_n", extra_voice_dirs="path/to/speaker_n/", **kwargs) |
|
``` |
|
|
|
Using 🐸TTS API: |
|
|
|
```python |
|
from TTS.api import TTS |
|
tts = TTS("tts_models/en/multi-dataset/tortoise-v2") |
|
|
|
# cloning `lj` voice from `TTS/tts/utils/assets/tortoise/voices/lj` |
|
# with custom inference settings overriding defaults. |
|
tts.tts_to_file(text="Hello, my name is Manmay , how are you?", |
|
file_path="output.wav", |
|
voice_dir="path/to/tortoise/voices/dir/", |
|
speaker="lj", |
|
num_autoregressive_samples=1, |
|
diffusion_iterations=10) |
|
|
|
# Using presets with the same voice |
|
tts.tts_to_file(text="Hello, my name is Manmay , how are you?", |
|
file_path="output.wav", |
|
voice_dir="path/to/tortoise/voices/dir/", |
|
speaker="lj", |
|
preset="ultra_fast") |
|
|
|
# Random voice generation |
|
tts.tts_to_file(text="Hello, my name is Manmay , how are you?", |
|
file_path="output.wav") |
|
``` |
|
|
|
Using 🐸TTS Command line: |
|
|
|
```console |
|
# cloning the `lj` voice |
|
tts --model_name tts_models/en/multi-dataset/tortoise-v2 \ |
|
--text "This is an example." \ |
|
--out_path "output.wav" \ |
|
--voice_dir path/to/tortoise/voices/dir/ \ |
|
--speaker_idx "lj" \ |
|
--progress_bar True |
|
|
|
# Random voice generation |
|
tts --model_name tts_models/en/multi-dataset/tortoise-v2 \ |
|
--text "This is an example." \ |
|
--out_path "output.wav" \ |
|
--progress_bar True |
|
``` |
|
|
|
|
|
## Important resources & papers |
|
- Original Repo: https://github.com/neonbjb/tortoise-tts |
|
- Faster implementation: https://github.com/152334H/tortoise-tts-fast |
|
- Univnet: https://arxiv.org/abs/2106.07889 |
|
- Latent Diffusion:https://arxiv.org/abs/2112.10752 |
|
- DALL-E: https://arxiv.org/abs/2102.12092 |
|
|
|
## TortoiseConfig |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.configs.tortoise_config.TortoiseConfig |
|
:members: |
|
``` |
|
|
|
## TortoiseArgs |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.models.tortoise.TortoiseArgs |
|
:members: |
|
``` |
|
|
|
## Tortoise Model |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.models.tortoise.Tortoise |
|
:members: |
|
``` |
|
|