File size: 3,031 Bytes
fd1adc1
 
 
 
 
 
 
 
 
 
 
12d303c
 
 
 
 
 
fd1adc1
 
12d303c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
  - tortoise-tts
  - text-to-speech
  - voice-cloning
  - gradio
  - fastapi
---

# Tortoise TTS with Voice Cloning

A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.

## Description

This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
- Uploading your own voice sample for cloning
- Recording your voice directly in the browser
- Selecting from a variety of preset voices

The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.

## How to Use

### Web Interface

1. Enter the text you want to convert to speech
2. Choose one of the following voice options:
   - Upload a voice sample audio file (WAV format recommended)
   - Record your voice using your microphone
   - Select a preset voice from the dropdown menu
3. Click "Generate Speech"
4. Listen to or download the generated audio

### API Endpoints

The app also provides REST API endpoints for programmatic access:

1. **Voice File TTS** - `/api/tts_with_voice_file/`
   - POST request with:
     - `text`: Text to convert to speech (required)
     - `voice_file`: Audio file for voice cloning (optional)
     - `preset_voice`: Name of preset voice (optional, defaults to "random")

2. **Preset Voice TTS** - `/api/tts_with_preset/`
   - POST request with:
     - `text`: Text to convert to speech (required)
     - `preset_voice`: Name of preset voice (required)

### Python Example

```python
import requests

# Using preset voice
response = requests.post(
    "https://your-space-name.hf.space/api/tts_with_preset/",
    data={"text": "Hello, this is a test.", "preset_voice": "tom"}
)

# Save the audio file
with open("output.wav", "wb") as f:
    f.write(response.content)
```

## Technical Details

This app leverages:
- **Tortoise-TTS**: State-of-the-art text-to-speech model
- **Gradio**: For the intuitive user interface
- **FastAPI**: For the API endpoints
- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces

## Limitations

- Text generation may take some time (30-60 seconds) depending on text length
- Voice cloning quality depends on the clarity and length of the provided sample
- For best results, provide voice samples with clear speech and minimal background noise

## Credits

This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:

```
@misc{tortoise-tts,
  author = {James Betker},
  title = {Tortoise-TTS: A Multi-Voice TTS System},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
}
```

## License

This project is available under the Apache-2.0 License.