Spaces:
Build error
Build error
File size: 3,031 Bytes
fd1adc1 12d303c fd1adc1 12d303c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
- tortoise-tts
- text-to-speech
- voice-cloning
- gradio
- fastapi
---
# Tortoise TTS with Voice Cloning
A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
## Description
This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
- Uploading your own voice sample for cloning
- Recording your voice directly in the browser
- Selecting from a variety of preset voices
The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
## How to Use
### Web Interface
1. Enter the text you want to convert to speech
2. Choose one of the following voice options:
- Upload a voice sample audio file (WAV format recommended)
- Record your voice using your microphone
- Select a preset voice from the dropdown menu
3. Click "Generate Speech"
4. Listen to or download the generated audio
### API Endpoints
The app also provides REST API endpoints for programmatic access:
1. **Voice File TTS** - `/api/tts_with_voice_file/`
- POST request with:
- `text`: Text to convert to speech (required)
- `voice_file`: Audio file for voice cloning (optional)
- `preset_voice`: Name of preset voice (optional, defaults to "random")
2. **Preset Voice TTS** - `/api/tts_with_preset/`
- POST request with:
- `text`: Text to convert to speech (required)
- `preset_voice`: Name of preset voice (required)
### Python Example
```python
import requests
# Using preset voice
response = requests.post(
"https://your-space-name.hf.space/api/tts_with_preset/",
data={"text": "Hello, this is a test.", "preset_voice": "tom"}
)
# Save the audio file
with open("output.wav", "wb") as f:
f.write(response.content)
```
## Technical Details
This app leverages:
- **Tortoise-TTS**: State-of-the-art text-to-speech model
- **Gradio**: For the intuitive user interface
- **FastAPI**: For the API endpoints
- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
## Limitations
- Text generation may take some time (30-60 seconds) depending on text length
- Voice cloning quality depends on the clarity and length of the provided sample
- For best results, provide voice samples with clear speech and minimal background noise
## Credits
This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
```
@misc{tortoise-tts,
author = {James Betker},
title = {Tortoise-TTS: A Multi-Voice TTS System},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
}
```
## License
This project is available under the Apache-2.0 License.
|