RSHVR commited on
Commit
4a58eca
·
verified ·
1 Parent(s): 3958987

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -76
README.md CHANGED
@@ -17,91 +17,41 @@ tags:
17
  - fastapi
18
  ---
19
 
20
- # Tortoise TTS with Voice Cloning
 
21
 
22
- A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
23
 
24
- ## Description
 
 
 
 
25
 
26
- This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
27
- - Uploading your own voice sample for cloning
28
- - Recording your voice directly in the browser
29
- - Selecting from a variety of preset voices
30
 
31
- The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
32
-
33
- ## How to Use
34
-
35
- ### Web Interface
36
-
37
- 1. Enter the text you want to convert to speech
38
- 2. Choose one of the following voice options:
39
- - Upload a voice sample audio file (WAV format recommended)
40
- - Record your voice using your microphone
41
- - Select a preset voice from the dropdown menu
42
- 3. Click "Generate Speech"
43
- 4. Listen to or download the generated audio
44
-
45
- ### API Endpoints
46
-
47
- The app also provides REST API endpoints for programmatic access:
48
-
49
- 1. **Voice File TTS** - `/api/tts_with_voice_file/`
50
- - POST request with:
51
- - `text`: Text to convert to speech (required)
52
- - `voice_file`: Audio file for voice cloning (optional)
53
- - `preset_voice`: Name of preset voice (optional, defaults to "random")
54
-
55
- 2. **Preset Voice TTS** - `/api/tts_with_preset/`
56
- - POST request with:
57
- - `text`: Text to convert to speech (required)
58
- - `preset_voice`: Name of preset voice (required)
59
-
60
- ### Python Example
61
-
62
- ```python
63
- import requests
64
-
65
- # Using preset voice
66
- response = requests.post(
67
- "https://your-space-name.hf.space/api/tts_with_preset/",
68
- data={"text": "Hello, this is a test.", "preset_voice": "tom"}
69
- )
70
-
71
- # Save the audio file
72
- with open("output.wav", "wb") as f:
73
- f.write(response.content)
74
- ```
75
 
76
  ## Technical Details
77
 
78
- This app leverages:
79
- - **Tortoise-TTS**: State-of-the-art text-to-speech model
80
- - **Gradio**: For the intuitive user interface
81
- - **FastAPI**: For the API endpoints
82
- - **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
83
-
84
- ## Limitations
85
-
86
- - Text generation may take some time (30-60 seconds) depending on text length
87
- - Voice cloning quality depends on the clarity and length of the provided sample
88
- - For best results, provide voice samples with clear speech and minimal background noise
89
 
90
- ## Credits
 
 
91
 
92
- This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
93
 
94
- ```
95
- @misc{tortoise-tts,
96
- author = {James Betker},
97
- title = {Tortoise-TTS: A Multi-Voice TTS System},
98
- year = {2022},
99
- publisher = {GitHub},
100
- journal = {GitHub repository},
101
- howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
102
- }
103
- ```
104
 
105
- ## License
106
 
107
- This project is available under the Apache-2.0 License.
 
 
 
 
17
  - fastapi
18
  ---
19
 
20
+ # Voice Chat Assistant
21
+ A conversational voice assistant powered by AI that responds to your spoken queries with natural-sounding speech.
22
 
23
+ ## Features
24
 
25
+ - Speech Recognition: Uses OpenAI's Whisper model to accurately transcribe your voice
26
+ - Natural Language Understanding: Leverages Cohere's LLM API for intelligent responses
27
+ - Text-to-Speech: Generates natural speech using Tortoise-TTS
28
+ - Reply on Pause: Automatically responds when you finish speaking
29
+ - Conversation History: Maintains context throughout your dialogue
30
 
31
+ ## Demo
32
+ Speak into your microphone and the assistant will respond with voice!
 
 
33
 
34
+ ## How It Works
35
+ - Your voice is transcribed to text using Whisper
36
+ - The text is processed by Cohere's LLM to generate a response
37
+ - The response is converted to speech using Tortoise-TTS
38
+ - The conversation continues with full context retention
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Technical Details
41
 
42
+ This project utilizes:
 
 
 
 
 
 
 
 
 
 
43
 
44
+ - Zero-GPU: Efficient GPU memory usage with Hugging Face's Zero-GPU technology
45
+ - FastRTC: Real-time communication for seamless voice interaction
46
+ - Gradio: Simple and intuitive user interface
47
 
48
+ ## Setup
49
 
50
+ To run this locally, you'll need a Cohere API key and Python 3.8+.
 
 
 
 
 
 
 
 
 
51
 
52
+ ## Acknowledgements
53
 
54
+ OpenAI for the Whisper speech recognition model
55
+ Cohere for the language model API
56
+ Tortoise-TTS for the text-to-speech capabilities
57
+ Hugging Face for the Spaces and Zero-GPU infrastructure