Spaces:
Build error
Build error
metadata
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
- tortoise-tts
- text-to-speech
- voice-cloning
- gradio
- fastapi
Tortoise TTS with Voice Cloning
A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
Description
This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
- Uploading your own voice sample for cloning
- Recording your voice directly in the browser
- Selecting from a variety of preset voices
The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
How to Use
Web Interface
- Enter the text you want to convert to speech
- Choose one of the following voice options:
- Upload a voice sample audio file (WAV format recommended)
- Record your voice using your microphone
- Select a preset voice from the dropdown menu
- Click "Generate Speech"
- Listen to or download the generated audio
API Endpoints
The app also provides REST API endpoints for programmatic access:
Voice File TTS -
/api/tts_with_voice_file/
- POST request with:
text
: Text to convert to speech (required)voice_file
: Audio file for voice cloning (optional)preset_voice
: Name of preset voice (optional, defaults to "random")
- POST request with:
Preset Voice TTS -
/api/tts_with_preset/
- POST request with:
text
: Text to convert to speech (required)preset_voice
: Name of preset voice (required)
- POST request with:
Python Example
import requests
# Using preset voice
response = requests.post(
"https://your-space-name.hf.space/api/tts_with_preset/",
data={"text": "Hello, this is a test.", "preset_voice": "tom"}
)
# Save the audio file
with open("output.wav", "wb") as f:
f.write(response.content)
Technical Details
This app leverages:
- Tortoise-TTS: State-of-the-art text-to-speech model
- Gradio: For the intuitive user interface
- FastAPI: For the API endpoints
- Zero-GPU: For efficient GPU utilization on Hugging Face Spaces
Limitations
- Text generation may take some time (30-60 seconds) depending on text length
- Voice cloning quality depends on the clarity and length of the provided sample
- For best results, provide voice samples with clear speech and minimal background noise
Credits
This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
@misc{tortoise-tts,
author = {James Betker},
title = {Tortoise-TTS: A Multi-Voice TTS System},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
}
License
This project is available under the Apache-2.0 License.