Command_RTC / README.md
RSHVR's picture
Update README.md
12d303c verified
|
raw
history blame
3.03 kB
metadata
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
  - tortoise-tts
  - text-to-speech
  - voice-cloning
  - gradio
  - fastapi

Tortoise TTS with Voice Cloning

A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.

Description

This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:

  • Uploading your own voice sample for cloning
  • Recording your voice directly in the browser
  • Selecting from a variety of preset voices

The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.

How to Use

Web Interface

  1. Enter the text you want to convert to speech
  2. Choose one of the following voice options:
    • Upload a voice sample audio file (WAV format recommended)
    • Record your voice using your microphone
    • Select a preset voice from the dropdown menu
  3. Click "Generate Speech"
  4. Listen to or download the generated audio

API Endpoints

The app also provides REST API endpoints for programmatic access:

  1. Voice File TTS - /api/tts_with_voice_file/

    • POST request with:
      • text: Text to convert to speech (required)
      • voice_file: Audio file for voice cloning (optional)
      • preset_voice: Name of preset voice (optional, defaults to "random")
  2. Preset Voice TTS - /api/tts_with_preset/

    • POST request with:
      • text: Text to convert to speech (required)
      • preset_voice: Name of preset voice (required)

Python Example

import requests

# Using preset voice
response = requests.post(
    "https://your-space-name.hf.space/api/tts_with_preset/",
    data={"text": "Hello, this is a test.", "preset_voice": "tom"}
)

# Save the audio file
with open("output.wav", "wb") as f:
    f.write(response.content)

Technical Details

This app leverages:

  • Tortoise-TTS: State-of-the-art text-to-speech model
  • Gradio: For the intuitive user interface
  • FastAPI: For the API endpoints
  • Zero-GPU: For efficient GPU utilization on Hugging Face Spaces

Limitations

  • Text generation may take some time (30-60 seconds) depending on text length
  • Voice cloning quality depends on the clarity and length of the provided sample
  • For best results, provide voice samples with clear speech and minimal background noise

Credits

This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:

@misc{tortoise-tts,
  author = {James Betker},
  title = {Tortoise-TTS: A Multi-Voice TTS System},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
}

License

This project is available under the Apache-2.0 License.