RSHVR commited on
Commit
12d303c
·
verified ·
1 Parent(s): f3c69f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -1
README.md CHANGED
@@ -9,6 +9,99 @@ app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  license: apache-2.0
11
  short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
12
+ tags:
13
+ - tortoise-tts
14
+ - text-to-speech
15
+ - voice-cloning
16
+ - gradio
17
+ - fastapi
18
  ---
19
 
20
+ # Tortoise TTS with Voice Cloning
21
+
22
+ A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
23
+
24
+ ## Description
25
+
26
+ This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
27
+ - Uploading your own voice sample for cloning
28
+ - Recording your voice directly in the browser
29
+ - Selecting from a variety of preset voices
30
+
31
+ The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
32
+
33
+ ## How to Use
34
+
35
+ ### Web Interface
36
+
37
+ 1. Enter the text you want to convert to speech
38
+ 2. Choose one of the following voice options:
39
+ - Upload a voice sample audio file (WAV format recommended)
40
+ - Record your voice using your microphone
41
+ - Select a preset voice from the dropdown menu
42
+ 3. Click "Generate Speech"
43
+ 4. Listen to or download the generated audio
44
+
45
+ ### API Endpoints
46
+
47
+ The app also provides REST API endpoints for programmatic access:
48
+
49
+ 1. **Voice File TTS** - `/api/tts_with_voice_file/`
50
+ - POST request with:
51
+ - `text`: Text to convert to speech (required)
52
+ - `voice_file`: Audio file for voice cloning (optional)
53
+ - `preset_voice`: Name of preset voice (optional, defaults to "random")
54
+
55
+ 2. **Preset Voice TTS** - `/api/tts_with_preset/`
56
+ - POST request with:
57
+ - `text`: Text to convert to speech (required)
58
+ - `preset_voice`: Name of preset voice (required)
59
+
60
+ ### Python Example
61
+
62
+ ```python
63
+ import requests
64
+
65
+ # Using preset voice
66
+ response = requests.post(
67
+ "https://your-space-name.hf.space/api/tts_with_preset/",
68
+ data={"text": "Hello, this is a test.", "preset_voice": "tom"}
69
+ )
70
+
71
+ # Save the audio file
72
+ with open("output.wav", "wb") as f:
73
+ f.write(response.content)
74
+ ```
75
+
76
+ ## Technical Details
77
+
78
+ This app leverages:
79
+ - **Tortoise-TTS**: State-of-the-art text-to-speech model
80
+ - **Gradio**: For the intuitive user interface
81
+ - **FastAPI**: For the API endpoints
82
+ - **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
83
+
84
+ ## Limitations
85
+
86
+ - Text generation may take some time (30-60 seconds) depending on text length
87
+ - Voice cloning quality depends on the clarity and length of the provided sample
88
+ - For best results, provide voice samples with clear speech and minimal background noise
89
+
90
+ ## Credits
91
+
92
+ This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
93
+
94
+ ```
95
+ @misc{tortoise-tts,
96
+ author = {James Betker},
97
+ title = {Tortoise-TTS: A Multi-Voice TTS System},
98
+ year = {2022},
99
+ publisher = {GitHub},
100
+ journal = {GitHub repository},
101
+ howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
102
+ }
103
+ ```
104
+
105
+ ## License
106
+
107
+ This project is available under the Apache-2.0 License.